deleting the entries in Azure Table Storage that Are Certain Days Old Automation - azure

I want to automate to delete One month before entries in Azure Table Storage.
Now in Azure Table Storage I have one year entries. I delete it manually. but for the future, I need to automate this process. I need to delete one month old entries from Azure table storage.

If you have the partitionKey and the rowKey, you can Delete entries directly as the documentation mentions here https://msdn.microsoft.com/en-us/library/dd135727.aspx
, otherwise you will need to select entries first, know their (partitionKey, rowKey), then delete them.

Basically, to delete old entities in Azure Table Storage, you can first query them with Filtering on DateTime Properties, and then delete then in a loop.
And regarding
but for the future, I need to automate this process
If you have an Azure App Service, you can scheduling WebJobs to satisfy your requirement.
Additionally, you also can leverage Function App to implement the delete operations and use Azure Scheduler to configure a schedule to call your api.
Any further concern, please feel free to let me know.

You need to query the whole table, and find the entities whose Timestamp is older than 1 month, and delete all of them. Batch operation may be useful here since it supports deleting multiple entities with same partition key via one request.
Query entities REST API: https://msdn.microsoft.com/en-us/library/azure/dd179421.aspx
Delete entity REST API: https://msdn.microsoft.com/en-us/library/azure/dd135727.aspx
Perform entity group transaction in REST API: https://msdn.microsoft.com/en-us/library/azure/dd894038.aspx
If you're using C#, you can refer to this documentation: https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/#delete-an-entity

Take a look at the following Powershell solution for querying a storage table. You can incorporate this into some type of automation solution to run on a regular basis.
https://github.com/chriseyre2000/Powershell/tree/master/Azure2

This thread might be a little old, but I recently wrote a library to manage Azure Tables lifecycle.
Source/docs: https://github.com/pflajszer/AzureTablesLifecycleManager
Usage for your use case (and examples provided on the source page):
Using IQueryBuilder
public async Task<DataTransferResponse<T>> DoSomethingWithDataOlderThanAYearUsingQueryBuilder<T>(int option) where T : class, ITableEntity, new()
{
// this will return all the tables since it's an empty query:
var tableQuery =
new QueryBuilder();
// this will return all the data older than 1 year ago:
var dataQuery =
new QueryBuilder()
.AppendCondition(ODataPredefinedFilters.TimestampLessThanOrEqual(DateTime.Now.AddYears(-1)));
var dtr = new DataTransferResponse<T>();
switch (option)
{
case 1:
// this will move all the data that match the above filters to a new table:
var newTableName = "someNewTable";
newTableName.EnsureValidAzureTableName();
dtr = await _api.MoveDataBetweenTablesAsync<T>(tableQuery, dataQuery, newTableName);
break;
case 2:
// ...or delete it permanently:
dtr = await _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
break;
case 3:
// ...or just fetch the data:
dtr = await _api.GetDataFromTablesAsync<T>(tableQuery, dataQuery);
break;
default:
break;
}
return dtr;
}
Using LINQ:
public async Task<DataTransferResponse<T>> DoSomethingWithDataOlderThanAYearUsingLINQExpression<T>(int option) where T : class, ITableEntity, new()
{
// this query will return all the tables:
Expression<Func<TableItem, bool>> tableQuery = x => true;
// this query will return all data in the above tables that matches the condition (all data older than 1 year ago)
Expression<Func<T, bool>> dataQuery = x => x.Timestamp < DateTime.Now.AddYears(-1);
var dtr = new DataTransferResponse<T>();
switch (option)
{
case 1:
// Moving the data to a new table:
var newTableName = "newTableName";
newTableName.EnsureValidAzureTableName();
dtr = await _api.MoveDataBetweenTablesAsync<T>(tableQuery, dataQuery, newTableName);
break;
case 2:
// this call will delete the data that match the above filters:
dtr = await _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
break;
case 3:
// ...or just fetch the data:
dtr = await _api.GetDataFromTablesAsync<T>(tableQuery, dataQuery);
break;
default:
break;
}
return dtr;
}

Related

Changing a value in an Azure Cosmos DB

I've inherited a project at work that uses Azure Cosmos DB. It's completely new to me. In the CosmosDB, we have a bunch of user preferences that are saved. I've discovered a typo in the settings that I need to fix. However, I cannot figure out how to modify the value.
So far I've found the query explorer and I want to run this query:
Update c
set c.Setting = REPLACE(c.Setting, 'N*m', 'N-m')
but query explorer only supports select, not update.
I tried to use Azure Storage Explorer, but when I try to access the document I get nothing except a modal saying "Hold on! We are still working on this." Seriously Microsoft?
My current thinking is to upload a stored procedure and run that. But I'm not sure where to start. My other thinking is to write a small c# application that iterates through each user document and updates them individually. Something like this:
currId = 0;
databaseId = ...;
collectionId = ...;
collectionLink = ...;
while (currId < maxUserId) {
var response = await client.ReadDocumentAsync(UriFactory.CreateDocumentUri(databaseId, collectionId, currId.ToString()));
if (response.Resource != null) {
var upserted = response.Resource;
upserted.SetPropertyValue("Setting", "N-m");
response = await client.UpsertDocumentAsync(collectionLink, upserted);
}
currId++;
}
But boy if that doesn't seem like a dumb idea...
What's the best way to update a single value in a CosmosDB Document?

How to Speed Up Contract API CustomerID Search?

I'm trying to search the existing Customers and return the CustomerID if it exists. This is the code I'm using which works:
var CustomerToFind = new Customer
{
MainContact = new Contact
{
Email = new StringSearch { Value = emailIn }
}
};
var sw = new Stopwatch();
sw.Start();
//see if any results
var result = (Customer)soapClient.Get(CustomerToFind);
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds);
However, I've finding it appears extremely slow to the point of being unusable. For example on the DEMO dataset, on my i7-6700k # 4GHz with 24gb ram and SSD running SQL Server 2016 Developer Edition locally a simple email search takes between 3-4seconds. However on my production dataset with 10k Customer records, it takes over 60 seconds and times out.
Is this typical using Contract based soap? Screen based soap seems much faster and almost instant. If I perform a SQL select on the database tables in Microsoft Management Studio I can also return the result instantly.
Is there a better quick way to query if a Customer with email address = "test#test.com" exists and return the Customer ID?
Try using GetList instead of Get. It's better suited for "search for smth" scenarios.
When using GetList, depending on which endpoint you're using, there are two more optimizations. In Default/5.30.001 endpoint there's a second parameter to GetList which you should set to false. In Default/6.00.001 endpoint there's no second parameter but there is additional property in the entity itself, called ReturnBehavior. Either set it to OnlySpecified and then add *Return to required fields, like this:
var CustomerToFind = new Customer
{
ReturnBehavior = ReturnBehavior.OnlySpecified,
CustomerID = new StringReturn(),
MainContact = new Contact
{
Email = new StringSearch { Value = emailIn }
}
};
or set it to OnlySystem and then use ID on returned entity to request the full entity.

Data tracking in DocumentDB

I was trying to keep the history of data (at least one step back) of DocumentDB.
For example, if I have a property called Name in document with value "Pieter". Now I am changing that to "Sam", I have to maintain the history , it was "Pieter" previously.
As of now I am thinking of a pre-trigger. Any other solutions ?
Cosmos DB (formerly DocumentDB) now offers change tracking via Change Feed. With Change Feed, you can listen for changes on a particular collection, ordered by modification within a partition.
Change feed is accessible via:
Azure Functions
DocumentDB (SQL) SDK
Change Feed Processor Library
For example, here's a snippet from the Change Feed documentation, on reading from the Change Feed, for a given partition (full code example in the doc here):
IDocumentQuery<Document> query = client.CreateDocumentChangeFeedQuery(
collectionUri,
new ChangeFeedOptions
{
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = -1,
// Set reading time: only show change feed results modified since StartTime
StartTime = DateTime.Now - TimeSpan.FromSeconds(30)
});
while (query.HasMoreResults)
{
FeedResponse<dynamic> readChangesResponse = query.ExecuteNextAsync<dynamic>().Result;
foreach (dynamic changedDocument in readChangesResponse)
{
Console.WriteLine("document: {0}", changedDocument);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
If you're trying to make an audit log I'd suggest looking into Event Sourcing.Building your domain from events ensures a correct log. See https://msdn.microsoft.com/en-us/library/dn589792.aspx and http://www.martinfowler.com/eaaDev/EventSourcing.html

Azure Storage Table Does not return whole partition

I found some situation on production when
CloudContext.TableData.Where( A => A.PartitionKey == "MYKEY").ToList();
where TableData is
public DataServiceQuery<T> TableData { get { return CreateQuery<T>( _TableName ); } }
does not return the whole partition (I have less than 1000 records there).
In my case it returns 367 records while in VS2010 Server Explorer or in Azure Storage Explorer I get 414 records (condition is the same).
Did anyone experience the same problem?
Also If I change the query and add RowKey into the condition - I get required record with no problem.
You have to better understand the Table Service. In the official documentation here there are listed other conditions which affect number of records returned. If you want to retrieve the whole partition you have to inspect the TableResult for Continuation Token and use provided continuation token to execute the same query over and over again, until all the results come.
You can use an approach similar to the following:
private IEnumerable<MyEntityType> GetAllEntities()
{
var result = this._tables.GetSegmentedEntities(100, null); // null is for continuation token
while (result.Results.Count > 0)
{
foreach (var ufs in result.Results)
{
yield return new MyEntityType(ufs.RowKey, ufs.WhateverOtherPropertyINeed);
}
if (result.ContinuationToken != null)
{
result = this._tables.GetSegmentedEntities(100, result.ContinuationToken);
}
else
{
break;
}
}
}
Where GetSegmentedEntities(100, result.ContinuationToken) is defined as:
public TableQuerySegment<MyEntityType> GetSegmentedEntities(int pageSize, TableContinuationToken token)
{
var partKey = "My_Desired_Partition_key_passed_via_Const_or_method_Param";
TableQuery<MyEntityType> query = new TableQuery<MyEntityType>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey));
query.TakeCount = pageSize;
return this.azureTableReference.ExecuteQuerySegmented<MyEntityType>(query, token);
}
You can use and modify this code for your case.
This is a known and documented behavior. The Table service API will either return 1000 entities or as much entities as possible within 5 seconds. If the query takes longer than 5 seconds to execute, it'll return a continuation token.
With the addition of rowkey you are making the query more specific and hence faster and as a result yo are getting all the entities.
See TimeOuts and Pagination on MSDN for details
If you are getting partial result sets then there will be two factors.
i) You are having more than 1000 records matching the filter
ii) Querying took more than 5 seconds.
iii) Query crosses partition boundary.
As you are having less than 1000 records the first factor wont be a issue.And as you are retrieving based on PartitionKey equality third one also wont cause any problem. You are facing this problem because of second factor.
Two handle this you need to work on continuation token. You can refer this link for more info.

Add or replace entity in Azure Table Storage

I'm working with Windows Azure Table Storage and have a simple requirement: add a new row, overwriting any existing row with that PartitionKey/RowKey. However, saving the changes always throws an exception, even if I pass in the ReplaceOnUpdate option:
tableServiceContext.AddObject(TableName, entity);
tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.ReplaceOnUpdate);
If the entity already exists it throws:
System.Data.Services.Client.DataServiceRequestException: An error occurred while processing this request. ---> System.Data.Services.Client.DataServiceClientException: <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<code>EntityAlreadyExists</code>
<message xml:lang="en-AU">The specified entity already exists.</message>
</error>
Do I really have to manually query for the existing row first and call DeleteObject on it? That seems very slow. Surely there is a better way?
As you've found, you can't just add another item that has the same row key and partition key, so you will need to run a query to check to see if the item already exists. In situations like this I find it helpful to look at the Azure REST API documentation to see what is available to the storage client library. You'll see that there are separate methods for inserting and updating. The ReplaceOnUpdate only has an effect when you're updating, not inserting.
While you could delete the existing item and then add the new one, you could just update the existing one (saving you one round trip to storage). Your code might look something like this:
var existsQuery = from e
in tableServiceContext.CreateQuery<MyEntity>(TableName)
where
e.PartitionKey == objectToUpsert.PartitionKey
&& e.RowKey == objectToUpsert.RowKey
select e;
MyEntity existingObject = existsQuery.FirstOrDefault();
if (existingObject == null)
{
tableServiceContext.AddObject(TableName, objectToUpsert);
}
else
{
existingObject.Property1 = objectToUpsert.Property1;
existingObject.Property2 = objectToUpsert.Property2;
tableServiceContext.UpdateObject(existingObject);
}
tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.ReplaceOnUpdate);
EDIT: While correct at the time of writing, with the September 2011 update Microsoft have updated the Azure table API to include two upsert commands, Insert or Replace Entity and Insert or Merge Entity
In order to operate on an existing object NOT managed by the TableContext with either Delete or SaveChanges with ReplaceOnUpdate options, you need to call AttachTo and attach the object to the TableContext, instead of calling AddObject which instructs TableContext to attempt to insert it.
http://msdn.microsoft.com/en-us/library/system.data.services.client.dataservicecontext.attachto.aspx
in my case it was not allowed to remove it first, thus I do it like this, this will result in one transaction to server which will first remove existing object and than add new one, removing need to copy property values
var existing = from e in _ServiceContext.AgentTable
where e.PartitionKey == item.PartitionKey
&& e.RowKey == item.RowKey
select e;
_ServiceContext.IgnoreResourceNotFoundException = true;
var existingObject = existing.FirstOrDefault();
if (existingObject != null)
{
_ServiceContext.DeleteObject(existingObject);
}
_ServiceContext.AddObject(AgentConfigTableServiceContext.AgetnConfigTableName, item);
_ServiceContext.SaveChangesWithRetries();
_ServiceContext.IgnoreResourceNotFoundException = false;
Insert/Merge or Update was added to the API in September 2011. Here is an example using the Storage API 2.0 which is easier to understand then the way it is done in the 1.7 api and earlier.
public void InsertOrReplace(ITableEntity entity)
{
retryPolicy.ExecuteAction(
() =>
{
try
{
TableOperation operation = TableOperation.InsertOrReplace(entity);
cloudTable.Execute(operation);
}
catch (StorageException e)
{
string message = "InsertOrReplace entity failed.";
if (e.RequestInformation.HttpStatusCode == 404)
{
message += " Make sure the table is created.";
}
// do something with message
}
});
}
The Storage API does not allow more than one operation per entity (delete+insert) in a group transaction:
An entity can appear only once in the transaction, and only one operation may be performed against it.
see MSDN: Performing Entity Group Transactions
So in fact you need to read first and decide on insert or update.
You may use UpsertEntity and UpsertEntityAsync methods in the official Microsoft Azure.Data.Tables TableClient.
The fully working example is available at https://github.com/Azure-Samples/msdocs-azure-data-tables-sdk-dotnet/blob/main/2-completed-app/AzureTablesDemoApplicaton/Services/TablesService.cs --
public void UpsertTableEntity(WeatherInputModel model)
{
TableEntity entity = new TableEntity();
entity.PartitionKey = model.StationName;
entity.RowKey = $"{model.ObservationDate} {model.ObservationTime}";
// The other values are added like a items to a dictionary
entity["Temperature"] = model.Temperature;
entity["Humidity"] = model.Humidity;
entity["Barometer"] = model.Barometer;
entity["WindDirection"] = model.WindDirection;
entity["WindSpeed"] = model.WindSpeed;
entity["Precipitation"] = model.Precipitation;
_tableClient.UpsertEntity(entity);
}

Resources