Azure Table Storage access time - inserting/reading from

Azure Table Storage access time - inserting/reading from - azure

I'm making a program that stores and reads from Azure tables some that are stored in CSV files. What I got are CSV files that that can have various number of columns, and between 3k and 50k rows. What I need to do is upload that data in Azure table. So far I managed to both upload data and retrieve it.
I'm using REST API, and for uploading I'm creating XML batch request, with 100 rows per request. Now that works fine, except it takes a bit too long to upload, ex. for 3k rows it takes around 30seconds. Is there any way to speed that up? I noticed that it takes most time when proccessing response ( for ReadToEnd() command ). I read somewhere that setting proxy to null could help, but it doesn't do much in my case.
I also found somewhere that it is possible to upload whole XML request to blob and then execute it from there, but I couldn't find any example for doing that.
using (Stream requestStream = request.GetRequestStream())
{
requestStream.Write(content, 0, content.Length);
}
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Stream dataStream = response.GetResponseStream();
using (var reader = new StreamReader(dataStream))
{
String responseFromServer = reader.ReadToEnd();
}
}
As for retrieving data from azure tables, I managed to get 1000 entities per request. As for that, it takes me around 9s for CS with 3k rows. It also takes most time when reading from stream. When I'm calling this part of the code (again ReadToEnd() ):
response = request.GetResponse() as HttpWebResponse;
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string result = reader.ReadToEnd();
}
Any tips?

As you mentioned you are using REST API you may have to write extra code and depend on your own methods to implement performance improvement differently then using client library. In your case using Storage client library would be best as you can use already build features to expedite insert, upsert etc as described here.
However if you were using Storage Client Library and ADO.NET you can use the article below which is written by Windows Azure Table team as supported way to improve Azure Access Performance:
.NET and ADO.NET Data Service Performance Tips for Windows Azure Tables

Related

add header at top in csv file of append blob

I am creating a pipeline in Azure data factory where I am using Function app as one of activity to transform data and store in append blob container as csv format .As I have taken 50 batches in for loop so 50 times my function app is to process data for each order.I am appending header in csv file with below logic.
//First I am creating file as per business logic //
//csveventcontent is my source data //
var dateAndTime = DateTime.Now.AddDays(-1);
string FileDate = dateAndTime.ToString("ddMMyyyy");
string FileName = _config.ContainerName + FileDate + ".csv";
StringBuilder csveventcontent = new StringBuilder();
OrderEventService obj = new OrderEventService();
//Now I am checking if todays file exists and if it doesn't we create it.//
if (await appBlob.ExistsAsync() == false)
{
await appBlob.CreateOrReplaceAsync(); //CreateOrReplace();
//Append Header
csveventcontent.AppendLine(obj.GetHeader());
}
Now the problem is header is appending so many times in csv file .Sometimes it is not appending at top.Probably due to parralel function app is running 50 times.
How I can fixed header at top only at one time.
I have tried with data flow and logic app also but unable to do it.If it can be handled through code that would be easier I guess.

I think you are right there. Its the concurrency of the function app that is causing the problem. The best approach would be to use a queue and process messages one by one. Or you could use a distributed lock to ensure only one function writes to the file at a time. You can use blob leases for this.
The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.
Refer: Lease Blob Request Headers

How to check if two files have the same content

I am working with a nodejs application. I am querying an API endpoint, and storing the retrieved data inside a database. Everything is working well. However, there are instances where some data is not pushed to the database. In this case what I normally do is manually query the endpoint by assigning the application the date when that data was lost, and retrieve it since its stored in a server which automatically deletes the data after 2 days. The API and database fields are identical.
The following is not the problem, but to give you context, I would like to automate this process by making the application retrieve all the data for the past 48 HRS, save it in a .txt file inside the app. I will do the same, query my mssql database to retrieve the data for the past 48 hrs.
My question is, how can check whether the contents of my api.txt file are the same with that of the db.txt?

You could make use of buf.equals(), as detailed in the docs
const fs = require('fs');
var api = fs.readFileSync('api.txt');
var db = fs.readFileSync('db.txt');
//Returns bool
api.equals(db)
So that:
if (api.equals(db))
console.log("equal")
else
console.log("not equal")

Comparing data in an Azure Index to data that is about to be uploaded

I have an index using the Azure Cognitive Search service. I'm writing a program to automate the upload of new data to this index. I don't want to delete and re-create the index from scratch unnecessarily each time. Is there a way of comparing what is currently in the index with the data that I am about to upload, without having to download that data from there first and manually compare it? I have been looking at the MS documentation and other articles but cannot see a way to do this comparison?

you can use MergeOrUpload operation, so if it's not there it will insert, otherwise update.
Please make sure the IDs are the same otherwise you'll endup always adding new items.
IndexAction.MergeOrUpload(
new Customer()
{
Id = "....",
UpdatedBy = new
{
Id = "..."
}
}
)
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.search.models.indexactiontype?view=azure-dotnet

How to avoid race condition when updating Azure Table Storage record

Azure Function utilising Azure Table Storage
I have an Azure Function which is triggered from Azure Service Bus topic subscription, let's call it "Process File Info" function.
The message on the subscription contains file information to be processed. Something similar to this:
{
"uniqueFileId": "adjsdakajksajkskjdasd",
"fileName":"mydocument.docx",
"sourceSystemRef":"System1",
"sizeBytes": 1024,
... and other data
}
The function carries out the following two operations -
Check individual file storage table for the existing of the file. If it exists, update that file. If it's new, add the file to the storage table (stored on a per system|per fileId basis).
Capture metrics on the file size bytes and store in a second storage table, called metrics (constantly incrementing the bytes, stored on a per system|per year/month basis).
The following diagram gives a brief summary of my approach:
The difference between the individualFileInfo table and the fileMetric is that the individual table has one record per file, where as the metric table stores one record per month that is constantly updated (incremented) gathering the total bytes that are passed through the function.
Data in the fileMetrics table is stored as follows:
The issue...
Azure functions are brilliant at scaling, in my setup I have a max of 6 of these functions running at any one time. Presuming each file message getting processed is unique - updating the record (or inserting) in the individualFileInfo table works fine as there are no race conditions.
However, updating the fileMetric table is proving problematic as say all 6 functions fire at once, they all intend to update the metrics table at the one time (constantly incrementing the new file counter or incrementing the existing file counter).
I have tried using the etag for optimistic updates, along with a little bit of recursion to retry should a 412 response come back from the storage update (code sample below). But I can't seem to avoid this race condition. Has anyone any suggestion on how to work around this constraint or come up against something similar before?
Sample code that is executed in the function for storing the fileMetric update:
internal static async Task UpdateMetricEntry(IAzureTableStorageService auditTableService,
string sourceSystemReference, long addNewBytes, long addIncrementBytes, int retryDepth = 0)
{
const int maxRetryDepth = 3; // only recurively attempt max 3 times
var todayYearMonth = DateTime.Now.ToString("yyyyMM");
try
{
// Attempt to get existing record from table storage.
var result = await auditTableService.GetRecord<VolumeMetric>("VolumeMetrics", sourceSystemReference, todayYearMonth);
// If the volume metrics table existing in storage - add or edit the records as required.
if (result.TableExists)
{
VolumeMetric volumeMetric = result.RecordExists ?
// Existing metric record.
(VolumeMetric)result.Record.Clone()
:
// Brand new metrics record.
new VolumeMetric
{
PartitionKey = sourceSystemReference,
RowKey = todayYearMonth,
SourceSystemReference = sourceSystemReference,
BillingMonth = DateTime.Now.Month,
BillingYear = DateTime.Now.Year,
ETag = "*"
};
volumeMetric.NewVolumeBytes += addNewBytes;
volumeMetric.IncrementalVolumeBytes += addIncrementBytes;
await auditTableService.InsertOrReplace("VolumeMetrics", volumeMetric);
}
}
catch (StorageException ex)
{
if (ex.RequestInformation.HttpStatusCode == 412)
{
// Retry to update the volume metrics.
if (retryDepth < maxRetryDepth)
await UpdateMetricEntry(auditTableService, sourceSystemReference, addNewBytes, addIncrementBytes, retryDepth++);
}
else
throw;
}
}
Etag keeps track of conflicts and if this code gets a 412 Http response it will retry, up to a max of 3 times (an attempt to mitigate the issue). My issue here is that I cannot guarantee the updates to table storage across all instances of the function.
Thanks for any tips in advance!!

You can put the second part of the work into a second queue and function, maybe even put a trigger on the file updates.
Since the other operation sounds like it might take most of the time anyways, it could also remove some of the heat from the second step.
You can then solve any remaining race conditions by focusing only on that function. You can use sessions to limit the concurrency effectively. In your case, the system id could be a possible session key. If you use that, you will only have one Azure Function processing data from one system at one time, effectively solving your race conditions.
https://dev.to/azure/ordered-queue-processing-in-azure-functions-4h6c
Edit: If you can't use Sessions to logically lock the resource, you can use locks via blob storage:
https://www.azurefromthetrenches.com/acquiring-locks-on-table-storage/

File upload with metadata using SharePoint Web Services

I trying to upload a file with metadata using SharePoint Web Services. The first approach I took is to use the WebRequest/WebResponse objects and then update the metadata using the Lists.asmx - UpdateListItems method. This works just fine but it creates two versions of the file. The second approach I took was to use the Copy.asmx web service and use the CopyIntoItems method which copies the file data along with the metadata. This works fine and creates v 1.0 but when I try to upload the same file with some changes in the metadata (using the Copy.asmx) it does not do update anything. Does anybody came across the same issue or has some other ideas to implement the required functionality.
Thanks,
Kiran

This might be a bit of topic (sorry) but I'd like to advice you to a real timesaving shortcut when working with SharePoint remotely, http://www.bendsoft.com/net-sharepoint-connector/
It enables you to work with SharePoint lists and document libraries with SQL and stored procedures.
Uploading a file as a byte array
...
string sql = "CALL UPLOAD('Shared Documents', 'Images/Logos/mylogo.png', #doc)";
byte[] data = System.IO.File.ReadAllBytes("C:\\mylogo.png");
SharePointCommand cmd = new SharePointCommand(sql, myOpenConnection);
cmd.Parameters.Add("#doc", data);
cmd.ExecuteNonQuery();
...
Upload stream input
using (fs == System.IO.File.OpenRead("c:\\150Mb.bin")) {
string sql = "CALL UPLOAD('Shared Documents', '150Mb.bin', #doc)";
SharePointCommand cmd = new SharePointCommand(sql, myOpenConnection);
cmd.Parameters.Add("#doc", fs);
cmd.ExecuteNonQuery();
}
There are quite a few methods to simplify remote document management
UPLOAD(lisname, filename, data)
DOWNLOAD(listname, filename)
MOVE(listname1, filename1, listname2, filename2)
COPY(listname1, filename1, listname2, filename2)
RENAME(listname, filename1, filename2)
DELETE(listname, filename)
CREATEFOLDER(listname, foldername)
CHECKOUT(list, file, offline, lastmodified)
CHECKIN(list, file, comment, type)
UNDOCHECKOUT(list, file)
Cheers

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure Table Storage access time - inserting/reading from - azure

Related

add header at top in csv file of append blob

How to check if two files have the same content

Comparing data in an Azure Index to data that is about to be uploaded

How to avoid race condition when updating Azure Table Storage record

File upload with metadata using SharePoint Web Services

Categories

Resources