We use Azure Blob Storage as a storage for large files (~5-20Gb)
Our customer reported about problem. Download of such files sometimes stops and never ends.
I have logged download statistics and tried to download problem file several times. One of the attempt was unsuccessful and now I have a chart downloaded_size/time:
There were short pauses in downloading from 16:00 to 16:12. Intervals between pauses are identical, but length raises. At 16:12 speed become Kb/s and never returned to normal values.
Here is a code that proceed downloading (.NET 4.0):
CloudBlobContainer container = new CloudBlobContainer(new Uri(containerSAS));
CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobName);
var options = new BlobRequestOptions()
{
ServerTimeout = new TimeSpan(10, 0, 0, 0),
RetryPolicy = new LinearRetry(TimeSpan.FromMinutes(2), 100),
};
blockBlob.DownloadToStream(outputStream, null, options);
What could be a reason of such problems?
EDIT To get statistics I used the following Stream implementation:
public class TestControlledFileStream : Stream
{
private StreamWriter _Writer;
private long _Size;
public TestControlledFileStream(string filename)
{
this._Writer = new StreamWriter(filename);
}
public override void Write(byte[] buffer, int offset, int count)
{
_Size += count;
_Writer.WriteLine("{0}: ({1}, {2})", DateTime.UtcNow, _Size, count);
}
protected override void Dispose(bool disposing)
{
if (this._Writer != null)
this._Writer.Dispose();
}
}
It looks like you are having issues downloading a large blob. I have provided some steps below to help you debug these kinds of issues.
Quick comment: You don’t need to specify the linear retry policy for this download. The default exponential retry policy should suffice for your scenario. By setting the number of allowed failures for this retry policy to 100 you may be prolonging a problem that retrying won’t fix. Is there a reason you chose to use linear retry?
Debugging Azure Storage Issues
Storage Analytics logs are a useful first step in debugging as they
give you information about your requests such as server latency or
response status code. You can check what info is available here
and learn how to use logging here.
Client-side logging
provides client specific information that can help you narrow down
your issue.
Using Fiddler on your client can provide visibility for the state of your download such as if the client is retrying or not. Note that this may obfuscate speed or connection issues as the traffic will run the traffic through a proxy.
If you get more detailed information regarding this issue, we will be able to help mitigate your issue.
Related
I have an azure function that runs off of a queue trigger. The repository has method to grab the connection string from the ConnectionStrings collection.
return System.Configuration.ConfigurationManager.ConnectionStrings["MyDataBase"].ToString();
This works great for the most part but I see intermittently that this returns a null exception error.
Is there a way I can make this more robust?
Do azure functions sometimes fail to get the settings?
Should I store the setting in a different section?
I also want to say that this runs thousands of times a day but I see this popup about a 100 times.
Runtime version: 1.0.12299.0
Are you reading the configuration for every function call? You should consider reading it once (e.g. using a Lazy<string> and static) and reusing it for all function invocations.
Maybe there is a concurrency issue when multiple threads access the code. Putting a lock around the code could help as well. ConfigurationManager.ConnectionStrings should be tread-safe, but maybe it isn't in the V1 runtime.
A similar problem was posted here, but this concerned app settings and not connection strings. I don't think using CloudConfigurationManager should be the correct solution.
You can also try putting the connection string into the app settings, unless you are using Entity Framework.
Connection strings should only be used with a function app if you are using entity framework. For other scenarios use App Settings. Click to learn more.
(via Azure Portal)
Not sure if this applies to the V1 runtime as well.
The solution was to add a private static string for the connection string. Then only read from the configuration if it fails. I then added a retry that paused half a second. This basically removed this from happening.
private static string connectionString = String.Empty;
private string getConnectionString(int retryCount)
{
if (String.IsNullOrEmpty(connectionString))
{
if (System.Configuration.ConfigurationManager.ConnectionStrings["MyEntity"] != null)
{
connectionString = System.Configuration.ConfigurationManager.ConnectionStrings["MyEntity"].ToString();
}
else
{
if (retryCount > 2)
{
throw new Exception("Failed to Get Connection String From Application Settings");
}
retryCount++;
getConnectionString(retryCount);
}
}
return connectionString;
}
I don't know if this perfect but it works. I went from seeing this exception 30 times a day to none.
I'm working on a Azure function with http POST trigger, once client call it and post a json data, I will send it to event hub and save to data lake.
once it got hitted by the high traffic, 20k/hour, azure functino will generate high outbound TCP connection, which will exceed the limitation (1920) of the plan.
does high outbound TCP connection cause by writing to event hub, data lake, or both?
is there a way to reduce it so I don't have to pay more to upgrade our plan?
how to debug it to trouble shooting the problem?
here is the code of send data to event hub:
EventHubClient ehc = EventHubClient.CreateFromConnectionString(cn);
try
{
log.LogInformation($"{CogniPointListener.LogPrefix}Sending {batch.Count} Events: {DateTime.UtcNow}");
await ehc.SendAsync(batch);
await ehc.CloseAsync();
}
catch (Exception exception)
{
log.LogError($"{CogniPointListener.LogPrefix}SendingMessages: {DateTime.UtcNow} > Exception: {exception.Message}");
throw;
}
here is the send data to data lake:
var creds = new ClientCredential(clientId, clientSecret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(tenantId, creds).GetAwaiter().GetResult();
// Create ADLS client object
AdlsClient client = AdlsClient.CreateClient(adlsAccountFQDN, clientCreds);
try
{
using (var stream = client.CreateFile(fileName, IfExists.Overwrite))
{
byte[] textByteArray = Encoding.UTF8.GetBytes(str);
stream.Write(textByteArray, 0, textByteArray.Length);
}
// Debug
log.LogInformation($"{CogniPointListener.LogPrefix}SaveDataLake saved ");
}
catch (System.Exception caught)
{
string err = $"{caught.Message}Environment.NewLine{caught.StackTrace}Environment.NewLine";
log.LogError(err, $"{CogniPointListener.LogPrefix}SaveDataLake");
throw;
}
Thanks,
I just raised an issue with Azure SDK https://github.com/Azure/azure-sdk-for-net/issues/26884 reporting the problem of socket exhaustion when using ApplicationTokenProvider.LoginSilentAsync.
The current version 2.4.1 of Microsoft.Rest.ClientRuntime.Azure.Authentication uses the old version 4.3.0 of Microsoft.IdentityModel.Clients.ActiveDirectory that creates a new HttpClientHandler on every call.
Creating HttpClientHandler on every is bad. After HttpClientHandler is disposed, the underlaying socket connections are still active for significant time (in my experience 30+ seconds).
There's a thing called HttpClientFactory that ensures HttpClientHandler is not created frequently. Here's a guide from Microsoft explaining how to use HttpClient and HttpClientHandler properly - Use IHttpClientFactory to implement resilient HTTP requests.
I wish they reviewed their SDKs to ensure they follow their own guidelines.
Possible workaround
Microsoft.IdentityModel.Clients.ActiveDirectory since version 5.0.1-preview supports passing a custom HttpClientFactory.
IHttpClientFactory myHttpClientFactory = new MyHttpClientFactory();
AuthenticationContext authenticationContext = new AuthenticationContext(
authority: "https://login.microsoftonline.com/common",
validateAuthority: true,
tokenCache: <some token cache>,
httpClientFactory: myHttpClientFactory);
So it should be possible to replicate what ApplicationTokenProvider.LoginSilentAsync does in your codebase to create AuthenticationContext passing your own instance of HttpClientFactory.
The things you might need to do:
Ensure Microsoft.IdentityModel.Clients.ActiveDirectory with version of after 5.0.1-preview is added to the project
Since the code is used in Azure functions, HttpClientFactory needs to be set up and injected. More info can be found in another StackOverflow answer
Replace calls ApplicationTokenProvider.LoginSilentAsync(tenantId, creds) with something like that (this code is an inlined version of LoginSilentAsync that passes httpClientFactory to AuthenticationContext
var settings = ActiveDirectoryServiceSettings.Azure;
var audience = settings.TokenAudience.OriginalString;
var context = new AuthenticationContext(settings.AuthenticationEndpoint + domain,
settings.ValidateAuthority,
TokenCache.DefaultShared,
httpClientFactory);
var authenticationProvider = new MemoryApplicationAuthenticationProvider(clientCredential);
var authResult = await authenticationProvider.AuthenticateAsync(clientCredential.ClientId, audience, context).ConfigureAwait(false);
var credentials = new TokenCredentials(
new ApplicationTokenProvider(context, audience, clientCredential.ClientId, authenticationProvider, authResult),
authResult.TenantId,
authResult.UserInfo == null ? null : authResult.UserInfo.DisplayableId);
I really don't replicating the logic in the workaround, but I don't think there's any other option until it's fixed properly in Microsoft.Rest.ClientRuntime.Azure.Authentication
Good luck!
TCP connections are limited in specific numbers depending on the plan you have your functions on (Consumption or a static plan in any level B/S/P).
For high workloads I prefer to either
A: Use a queue with a separate function and limiting the concurrency by the function batch size and other settings
or
B: Use a SemaphoreSlim in order to control concurrency of outgoing traffic. (https://learn.microsoft.com/de-de/dotnet/api/system.threading.semaphoreslim?redirectedfrom=MSDN&view=netframework-4.7.2)
We've been having a problem for several months where the site becomes completely unresponsive for 5-15 minutes every day. We have added a ton of request logging, enabled DEBUG logging, and have finally found a pattern: Approximately 2 minutes prior to the outages (in every single log file I've looked at, going back to the beginning), the following lines appear:
2017-09-26 15:13:05,652 [P7940/D9/T76] DEBUG
Umbraco.Web.PublishedCache.XmlPublishedCache.XmlCacheFilePersister -
Timer: release. 2017-09-26 15:13:05,652 [P7940/D9/T76] DEBUG
Umbraco.Web.PublishedCache.XmlPublishedCache.XmlCacheFilePersister -
Run now (sync).
From what I gather this is the process that rebuilds the umbraco.config, correct?
We have ~40,000 nodes, so I can't imagine this would be the quickest process to complete, however the strange thing is that the CPU and Memory on the Azure Web App do not spike during these outages. This would seem to point to the fact that the disk I/O is the bottleneck.
This raises a few questions:
Is there a way to schedule this task in a way that it only runs
during off-peak hours?
Are there performance improvements in the newer versions (we're on 7.6.0) that might improve this functionality?
Are there any other suggestions to help correct this behavior?
Hosting environment:
Azure App Service B2 (Basic)
SQL Azure Standard (20 DTUs) - DTU usage peaks at 20%, so I don't think there's anything there. Just noting for completeness
Azure Storage for media storage
Azure CDN for media requests
Thank you so much in advance.
Update 10/4/2017
If it helps, It appears that these particular log entries correspond with the first publish of the day.
I don't feel like 40,000 nodes is too much for Umbraco, but if you want to schedule republishes, you can do this:
You can programmatically call a cache refresh using:
ApplicationContext.Current.Services.ContentService.RePublishAll();
(Umbraco source)
You could create an API controller which you could call periodically by a URL. The controller would probably look something like:
public class CacheController : UmbracoApiController
{
[HttpGet]
public HttpResponseMessage Republish(string pass)
{
if (pass != "passcode")
{
return Request.CreateResponse(HttpStatusCode.Unauthorized, new
{
success = false,
message = "Access denied."
});
}
var result = Services.ContentService.RePublishAll();
if (result)
{
return Request.CreateResponse(HttpStatusCode.OK, new
{
success = true,
message = "Republished"
});
}
return Request.CreateResponse(HttpStatusCode.InternalServerError, new
{
success = false,
message = "An error occurred"
});
}
}
You could then periodically ping this URL:
/umbraco/api/cache/republish?code=passcode
I have a blog post on how you can read on how to schedule events like these to occur. I recommend just using the Windows Task Scheduler to ping the URL: https://harveywilliams.net/blog/better-task-scheduling-in-umbraco#windows-task-scheduler
I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function:
Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error
occurred while sending the request. System: Unable to connect to the
remote server. System: Only one usage of each socket address
(protocol/network address/port) is normally permitted
123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
If it's a local socket limitation, the article above has an example of increasing the local port range using Req.ServicePoint.BindIPEndPointDelegate. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.
If it's a remote limitation, it looks like I can control how many messages the Functions runtime will process at once. There's an interesting article here that says you can set serviceBus.maxConcurrentCalls to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.
Multiple Azure Functions apps? What happens if I have more than one Azure Functions app and they all trigger on the same queue? Is Azure smart enough to divvy up the work among the Function apps and I could have an army of machines processing my queue, which could be scaled up or down as needed?
I've also come across keep-alives. It seems to me if I could somehow keep my socket open as queue messages were flooding in, it could perhaps help greatly. Is this possible, and any tips on how I'd go about doing this?
Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!
I think the code error is because of: using (var client = new HttpClient())
Quoted from Improper instantiation antipattern:
this technique is not scalable. A new HttpClient object is created for
each user request. Under heavy load, the web server may exhaust the
number of available sockets.
I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.
If you use consumption plan instead of Functions on a dedicated web app, #3 more or less occurs out of the box. Functions will detect that you have a large queue of messages and will add instances until queue length stabilizes.
maxConcurrentCalls only applies per instance, allowing you to limit per-instance concurrency. Basically, your processing rate is maxConcurrentCalls * instanceCount.
The only way to control global throughput would be to use Functions on dedicated web apps of the size you choose. Each app will poll the queue and grab work as necessary.
The best scaling solution would improve the load balancing on 123.123.123.123 so that it can handle any number of requests from Functions scaling up/down to meet queue pressure.
Keep alive afaik is useful for persistent connections, but function executions aren't viewed as a persistent connection. In the future we are trying to add 'bring your own binding' to Functions, which would allow you to implement connection pooling if you liked.
I know the question was answered long ago, but in the mean time Microsoft have documented the anti-pattern that you were using.
Improper Instantiation antipattern
I upload gzipped files to an Azure Storage Container (input). I then have a WebJob that is supposed to pick up the Blobs, decompress them and drop them into another Container (output). Both containers use the same storage account.
My problem is that it doesn't process all Blobs. It always seems to miss 1. This morning I uploaded 11 blobs to the input Container and only 10 were processed and dumped into the output Container. If I upload 4 then 3 will be processed. The dashboard will show 10 invocations even though 11 blobs have been uploaded. It doesn't look like it gets triggered for the 11th blob. If I only upload 1 it seems to process it.
I am running the website in Standard Mode with Always On set to true.
I have tried:
Writing code like the Azure Samples (https://github.com/Azure/azure-webjobs-sdk-samples).
Writing code like the code in this article (http://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started).
Using Streams for the input and output instead of CloudBlockBlobs.
Various combinations of closing the input, output and Gzip Streams.
Having the UnzipData code in the Unzip method.
This is my latest code. Am I doing something wrong?
public class Functions
{
public static void Unzip(
[BlobTrigger("input/{name}.gz")] CloudBlockBlob inputBlob,
[Blob("output/{name}")] CloudBlockBlob outputBlob)
{
using (Stream input = inputBlob.OpenRead())
{
using (Stream output = outputBlob.OpenWrite())
{
UnzipData(input, output);
}
}
}
public static void UnzipData(Stream input, Stream output)
{
GZipStream gzippedStream = null;
gzippedStream = new GZipStream(input, CompressionMode.Decompress);
gzippedStream.CopyTo(output);
}
}
As per Victor's comment above it looks like it is a bug on Microsoft's end.
Edit: I don't get the downvote. There is a problem and Microsoft are going to fix it. That is the answer to why some of my blobs are ignored...
"There is a known issue about some Storage log events being ignored. Those events are usually generated for large files. We have a fix for it but it is not public yet. Sorry for the inconvenience. – Victor Hurdugaci Jan 9 at 12:23"
Just as an workaround, what if you don't directly listen to the Blob instead bring a Queue in-between, when you write to the Input Blob Container also write a message about the new Blob in the Queue also, let the WebJob listen to this Queue, once a message arrived to the Queue , the WebJob Function read the File from the Input Blob Container and copied into the Output Blob Container.
Does this model work with you ?