Is there an equivalent to TransactionScope that you can use with Azure Table Storage?
what I'm trying to do is the following:
using (TransactionScope scope = new TransactionScope) {
account.balance -= 10;
purchaseOrders.Add(order);
accountDataSource.SaveChanges();
purchaseOrdersDataSource.SaveChanges();
scope.Complete();
}
If for some reason saving the account works, but saving the purchase order fails, I don't want the account to decrement the balance.
Within a single table and single partition, you may write multiple rows in an entity group transaction. There's no built-in transaction mechanism when crossing partitions or tables.
that said: remember that tables are schema-less, so if you really needed a transaction, you could store both your account row and your purchase order row in the same table, same partition, and do a single (transactional) save.
Related
I am fairly new to Azure cloud development.
I have a function app coded in C# that:
Gets a record from a storage table
Deletes that record
Updates fields on that record (including the partition key)
inserts the new record into the storage table
I am experiencing data loss, when an exception is thrown, on the insert portion.
I am wondering how, if step 4 throws an exception, I can then rollback step 2. If that is not possible how would I prevent the data loss, as I'm unable to use the built in Table Operations that would replace the entity because I am changing the partition key?
I understand that the hard part in all of this to be the partition key update, as I know the system was designed so that each transaction or operation is operating on records with the same partition key.
I have looked through the Table Service REST API and looked at all the Table Operations I thought could be helpful:
Insert Entity
Update Entity
Merge Entity
Insert or Update Entity
Insert or Replace Entity
You can't do transactions due to partition key. So you'll have to look at a solution outside of the table storage.
What you could do is create the record before deleting it. That way you're assured that you won't lose any data (as long as you make sure the request to create a record succeeded).
You could take it one step further by making it an async process. Having a storage queue or service bus queue up your message containing the information of the request and having a function app (or anything else) handle the requests. That way you can assure the request remains retryable in case any transient errors occur over a larger timespan.
As per question, we are able to reproduce the data loss issue.
In table, have below highlighted record.
Once exception occurred on Insert data got loss as mentioned in question
To update value of PartitionKey CosmosDb doesn't allow direct update to value of partitionkey. First we need to delete the record and then create new record with new partitionkey value.
To prevent data loss using built in TableOperations you can perform/call Execute() once prior steps got completed successfully.
TableOperation delOperation = TableOperation.Delete(getBatchCustomer);
You can clone or create a deep copy of first object using copy constructor.
public Customer(Customer customer)
{
PartitionKey = customer.PartitionKey;
RowKey = customer.RowKey;
customerName = customer.customerName;
}
Creating copy of the object
Customer c = new(getCustomer)
{
PartitionKey = "India"
};
In step 4 mentioned in question by you is completed successfully then we can commit delete operation.
Got exception on insert step
but when looked data table no data has been lost.
Below is code snippet to prevent data loss.
TableOperation _insOperation = TableOperation.Insert(c);
var insResult = _table.Execute(_insOperation);
if (insResult.HttpStatusCode == 204)
{
var delResult = _table.Execute(delOperation);
}
I added a lock flag in an Azure storage table hoping only one Azure function gets access to its data at a time. Below is how I did it:
Azure storage table with an item:
{
partitionKey: 'foo_id'
foo: 'foo_data'
bar: 'bar_data'
isLocked: false
}
And then I have queue triggered function that process and update foo/bar data in the table only if the item is not locked (isLocked == false).
The queue triggered function goes like this:
def main(msg: func.QueueMessage):
is_locked = get_property_from_table('foo_id', 'isLocked')
if not is_locked:
lock_task_in_table('foo_id') #isLocked = true
#continue with business logics
#that retrieve/update foo&bar data in task
unlock_task_in_table('foo_id') #isLocked = false
else:
#do nothing
However, when several messages concurrently tigger functions, it can still happen that more than two functions get table item data and run the business logic code at the same time. Is there any way I can allow only one Azure function to access my Azure table item at a time?
• You can do so by leasing the Azure blob storage along with the table storage. You can use this technique to ensure that only one azure function accesses the table partition at a given point of time and the update on data is consistent across the tables. For this purpose, create a blob and a table entry as follows and give the blob a name that matches up with the key to your table entity: -
Following is the code for lease protected table access: -
• Thus, as you can see above, the try block code is accessing the entities in the table storage and updating it but since we have named the blob container with the reference of key of the table entity, we acquire a lease on that table entity by gaining a lease on that blob (named key of the table). Also, as we are giving the lease ID of the lease time period, which is definite, the table entity access will be locked by the function for that time duration. The output of the above code in the storage account workspace will be a blob name that matches the partition key of the table entity and an entry in the table storage in the entities table.
Thus, in this way, you can create a lock through this lease function on certain table entries. For more information, please refer the below link: -
https://www.azurefromthetrenches.com/acquiring-locks-on-table-storage/
What are some ways to optimize the retrieval of large numbers of entities (~250K) from a single partition from Azure Table Storage to a .NET application?
As far as I know, there are two ways to optimize the retrieval of large numbers of entities from a single partition from Azure Table Storage to a .NET application.
1.If you don’t need to get all properties of the entity, I suggest you could use server-side projection.
A single entity can have up to 255 properties and be up to 1 MB in size. When you query the table and retrieve entities, you may not need all the properties and can avoid transferring data unnecessarily (to help reduce latency and cost). You can use server-side projection to transfer just the properties you need.
From:Azure Storage Table Design Guide: Designing Scalable and Performant Tables(Server-side projection)
More details, you could refer to follow codes:
string filter = TableQuery.GenerateFilterCondition(
"PartitionKey", QueryComparisons.Equal, "Sales");
List<string> columns = new List<string>() { "Email" };
TableQuery<EmployeeEntity> employeeQuery =
new TableQuery<EmployeeEntity>().Where(filter).Select(columns);
var entities = employeeTable.ExecuteQuery(employeeQuery);
foreach (var e in entities)
{
Console.WriteLine("RowKey: {0}, EmployeeEmail: {1}", e.RowKey, e.Email);
}
2.If you just want to show the table’s message, you needn’t to get all the entities at same time.
You could get part of the result.
If you want to get the other result, you could use the continuation token.
This will improve the table query performance.
A query against the table service may return a maximum of 1,000 entities at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 entities, if the query did not complete within five seconds, or if the query crosses the partition boundary, the Table service returns a continuation token to enable the client application to request the next set of entities. For more information about how continuation tokens work, see Query Timeout and Pagination.
From:Azure Storage Table Design Guide: Designing Scalable and Performant Tables(Retrieving large numbers of entities from a query)
By using continuation tokens explicitly, you can control when your application retrieves the next segment of data.
More details, you could refer to follow codes:
string filter = TableQuery.GenerateFilterCondition(
"PartitionKey", QueryComparisons.Equal, "Sales");
TableQuery<EmployeeEntity> employeeQuery =
new TableQuery<EmployeeEntity>().Where(filter);
TableContinuationToken continuationToken = null;
do
{
var employees = employeeTable.ExecuteQuerySegmented(
employeeQuery, continuationToken);
foreach (var emp in employees)
{
...
}
continuationToken = employees.ContinuationToken;
} while (continuationToken != null);
Besides, I suggest you could pay attention to the table partition scalability targets.
Target throughput for single table partition (1 KB entities) Up to 2000 entities per second
If you reach the scalability targets for this partition, the storage service will throttle.
I'm considering implementing an Audit Trail for my application in using Table Storage.
I need to be able to log all actions for a specific customer and all actions for entities from that customer.
My first guess was creating a table for each customer (Audits_CustomerXXX) and use as a partition key the entity id and row key the (DateTime.Max.Ticks - DateTime.Now.Ticks).ToString("D19") value. And this works great when my question is what happened to certain entity? For instance the audit of purchase would have PartitionKey = "Purchases/12345" and the RowKey as the timestamp.
But when I want a birds eye view from the entire customer, can I just query the table sorting by row key across partitions? Or is it better to create a secondary table to hold the data with different partition keys? Also when using the (DateTime.Max.Ticks - DateTime.Now.Ticks).ToString("D19") is there a way to prevent errors when two actions in the same partition happen in the same tick (unlikely but who knows...).
Thanks
You could certainly create a separate table for the birds eye view but you really don't have to. Considering Azure Tables are schema-less, you can keep this data in the same table as well. You would keep the PartitionKey as reverse ticks and RowKey as entity id. Because you would be querying only on PartitionKey, you can also keep RowKey as GUID as well. This will ensure that all entities are unique. Or you could append a GUID to your entity id and use that as RowKey.
However do keep in mind that because you're inserting two entities with different PartitionKey values, you will have to safegaurd your code against possible network failures as each entry will be a separate request to Table service. The way we're handling this in our application is we write this payload to a queue message and then process that message through a background process.
Are there any software patterns that would enable a transaction across multiple tables in Azure Table Storage?
I want to write (or delete) several entities from different tables in atomic way like...
try {
write entity to table A
write entity to table B
} catch {
delete entity from table A
delete entity from table B
}
During the above transaction I also want to prevent anyone from writing/deleting the same entities (same table, partition key and row key).
I know Azure Storage does not support this directly so I'm looking for a pattern perhaps using an additional table to "lock" entities in the transaction until its complete. All writers would have to obtain a lock on the entities.
The only way to ensure that no one else modifies rows in a table while you are working on them is to add the overhead of blob leasing. You can have the one instance/thread grab the blob lease and do whatever it needs to. Then, when done, release the blob. If it fails to grab the lease, it either has to wait or try again later.
The other table based operations, like pessimistic concurrency, will not actually prevent someone from modifying the records.