The Windows Azure Caching Document says
If possible, store and reuse the same DataCacheFactory object to conserve memory and optimize performance."
Has anyone seen any metrics or any quantification of how expensive this is?
One argument is that
"MaxConnectionsToServer setting... determines the number of chennels per DataCacheFactory that are opened to the cache cluster."
So if MaxConnectionsToServer = 1 and DataCacheFactory is a singleton in your app, then you've effectively syncronized all requests to your web server!
However, there is a lot of indication that DataCacheFactory should be a singleton (i.e. put in Application_OnStart).
This is critical and I can't believe it is not in the Microsoft documentation. Is the DataCacheFactory treated the same in AppFabric, Azure Shared Caching, and Azure Caching? I just have a difficult time believing that Microsoft designed caching in a way that requires a singleton factory object. This is like requiring anyone that uses SqlConnection to have a singleton SqlConnectionFactory object in their application.
So, considering a relatively average web app (For example, 1,000s of requests per hour, ~ 100 objects in cache, the average request accesses 5 cached objects):
By default (and recommendation) how many Factory objects should there be at one time?
How long does it take to create a DataCacheFactory reference?
How long does it take to create a DataCache reference?
Should their only be 1 DataCacheFactory object per app and only 1 DataCache reference per request?
EDIT (answers in progress):
(1/2). Let Azure connection pooling handle the Factory objects
(3). Still testing...
(4). Still trying to figure out if I should re-use DataCache references
How about that, Microsoft did document best practices and it does involve connection pooling! Although not easy to find (at least for me).
It appears that the answer is simply to not use the DataCacheFactory object when implementing the newer Azure Caching and just access the DataCache object directly
"There are also new overloads to the DataCache constructor that make
it simpler to create a cache client. In the past, it was always
necessary to create a DataCacheFactory object that returns the target
cache. Now it is possible to create the cache with the DataCache
constructor directly. The following example creates a client to the
default cache from the default section of the configuration file."
DataCache cache = new DataCache();
And to use connection pooling.
"With the latest Windows Azure SDK, connection pooling is enabled by
default when you define your cache settings in the application or web
configuration files. Because of this default behavior, it is important
to set the size of the connection pool correctly. The connection pool
size is configured with the maxConnectionsToServer attribute on the
dataCacheClient element."
I wish Microsoft gave some guidance on how to configure the maxConnectionsToServer correctly but that can be determined through testing. The automatic connection pooling with the new Azure Caching is pretty cool :)
I'm assuming you're referring to Shared Caching Service (previously known as Azure AppFabric Cache)
There is no cost per an individual connection. However, when you purchase a Cache account, you're paying not for only the size of a cache account but also for a particular number of connections.
Smallest cache account has 10 connections per hour, while most expensive one allows for 160 concurrent connections. Thus, if you're concerned that you may run out of connections given the size of your account, it maybe prudent to be careful as to how many connections you open from your app.
More details
http://msdn.microsoft.com/en-us/library/windowsazure/hh697522.aspx
Related
How to control the usage of APIs by consumers during a given period in Azure function app Http trigger. Simply how to set a requests throttle when exceed the request limit, and please let me know a solution without using azure API Gateway.
The only control you have over host creation in Azure Functions an obscure application setting: WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT. This implies that you can control the number of hosts that are generated, though Microsoft claim that “it’s not completely foolproof” and “is not fully supported”.
From my own experience it only throttles host creation effectively if you set the value to something pretty low, i.e. less than 50. At larger values then its impact is pretty limited. It’s been implied that this feature will be will be worked on in the future, but the corresponding issue has been open in GitHub with no update since July 2017.
For more details, you could refer to this article.
You can use the initialVisibilityDelay property of the CloudQueue.AddMessage function as outlined in this blog post.
This will throttle the message to prevent the 429 error if implemented correctly using the leaky bucket algorithm or equivalent.
I have an Azure Function on a timer that activates every minute, which calls an API which returns an integer value. I store this value in SQL.
I then have another Azure Function that can be queried by the user to retrieve the integer value. This query could in theory be as high as hundreds or thousands of times per second.
Rather than have the second Azure Function query SQL every single time it gets a request, I would like it to cache the value in memory. If the cache were perfect there would be no need for SQL at all, but because Functions can scale up and also seem to lose their cache periodically there has to be some persistent storage.
Is it just a case of a static variable within the function to cache the value, and another with the last date retrieved? Or is there another type of caching that I can use within the function?
I understand there are solutions such as Redis but they seem pretty overkill to spin up just for a single integer value. I'm also not even sure if Azure SQL itself would cache the value when it's requested.
My question is, would a static variable work (if it's null/reset then we'd just do a quick SQL query to get the value) and actually persist? Or does an alternative like redis or similar exist that wouldn't be overkill for this application? And finally, is there actually any harm (performance problems) in hammering SQL over and over to retrieve a single value (i.e. is it clever enough to cache already so there's not a significant performance hit vs. querying a variable in memory)?
Really depends. If you understand the limitations of using in-memory cache in an azure function, and your business case is fine with those limitations, you should use it.
The main thing is you can't invalidate cache.
So for example, if your number changes, it can be not usable for you. You will have cases where a container for your azure is spinning, and it has an old value. The same user could get different values on each request because who knows which instance he will hit, and what that instance is caching.
If you number is something that is set only once and doesn't change, you don't have this issue.
And another important thing is that you still make quite a few requests just to cache it. Every new container will have to cache it for itself, while centralized cache would do it only once. This can be fine for something smaller, but if the thing you're caching really takes significant amount of time, or if the resources of the service are super limited, you would be a lot more efficient with centralized cache.
No matter what, caching in Azure Function level still reduces the load, and there's no reason to make requests when you don't have to.
To answer your sql question, yes, most likely the SQL server will cache it too, but your azure function still needs to establish a connection to sql server, make the request and kill the connection.
Azure functions best practices states that functions should be stateless and your state information should be with data. I think Redis is still the better option that SQL.
I'm using Azure Functions with queue triggers for part of our workload. The specific function queries the database and this creates problems with scaling since the large concurrent number of function instances pinging the db results in maximum allowed number of Azrue DB connections being hit constantly.
This article https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections lists HttpClient as one of those resources that should be made static.
Should database access also be made static with static SqlConnection to resolve this issue, or would that cause some other problems by keeping the constant connection object?
Should database access also be made static with static SqlConnection
Definitely not. Each function invocation should open a new SqlConnection, with the same connection string, in a using block. It's not really clear how many concurrent Function Invocations the runtime will make to a single instance of your application. But if it's more than 1, then a singleton SqlConnection is a bad thing.
I wonder exactly which limit you're hitting in SQL Database, the connection limit or the concurrent request limit? In either case I'm a bit surprised (not a Functions expert) that you get that many concurrent function invocations, so there might be something else going on. Like you're leaking SqlConnections.
But reading the Functions docs, my guess is that the functions runtime is scaling by launching multiple instances of your function app. Your .NET app could scale in a single process, but that's apparently not the way Functions works. Each instance of your Functions app has it's own ConnectionPool for SQL Server, and by default each ConnectionPool can have 100 connections.
Perhaps if you sharply limit the Max Pool Size in your connection string, won't have so many connections open. When you hit the Max Pool Size, new calls to SqlConnection.Open() will block for up to 30 seconds waiting for a pooled SqlConnection to become available. So this not only limits the connection use for each instance of your application, it throttles your throughput under load.
You can use the configuration settings in host.json to control the level of concurrency your functions execute at per instance and the max scaleout setting to control how many instances you scale out to. This will let you control the total amount of load put on your database.
For future readers, the documentation has been updated with some information about the SQL connection stating:
Your function code may use the .NET Framework Data Provider for SQL Server (SqlClient) to make connections to a SQL relational database. This is also the underlying provider for data frameworks that rely on ADO.NET, such as Entity Framework. Unlike HttpClient and DocumentClient connections, ADO.NET implements connection pooling by default. However, because you can still run out of connections, you should optimize connections to the database. For more information, see SQL Server Connection Pooling (ADO.NET).
So, as David Browne already mentioned, you shouldn't make your SqlConnection static.
Microsoft.Azure.Devices.ServiceClient and Microsoft.Azure.Devices.RegistryManager both have ConnectFromConnectionString and CloseAsync methods.
Should we use them like we use other .NET connection-close patterns, such as ADO.NET connections, Redis connections, Sockets' etc.? When I use objects like those I try to Close or Disposable.Dispose() of them as soon as possible.
What's the upside and downside to doing the same with the Microsoft.Azure.Devices objects when accessing the same IOT Hub? I have running code that treats the individual RegistryManager and ServiceClient instances as singletons, which are used throughout an application's lifetime -- which may be weeks or months. Are we short circuiting ourselves by keeping these objects "open" for this duration?
As pointed by #David R. Williamson, and this discussion on GitHub, it is generally recommended to register RegistryManager and ServiceClient as singletons.
For ASP NET Core or Azure Functions, it can be done as follows at startup:
ConfigureServices(services)
{
services.AddSingleton<RegistryManager>(RegistryManager.CreateFromConnectionString("connectionstring"));
}
I'm on the Azure IoT SDK team. We recommend keeping the service client instances around, depending on your usage pattern.
While it is generally fine to init/close/dispose the Azure IoT service clients, you can run into the HttpClient socket exhaustion problem. We've done work to prevent it from happening if you keep a singleton of the client around, but when the client is disposed 2 HttpClient instances are also disposed.
There's little harm in keeping them around and will save your program execution time in init/dispose.
We're working on a v2 where we'll allow the user to pass in their own copy of HttpClient, so you could keep that around and shut down our client more frequently.
This is a best practices question.
Per this best practices article and per MSDN, the OrganizationServiceProxy is not thread safe.
If you have a multi threaded client application in which you are creating an instance of an
OrganizationServiceContext (on a per thread basis), the constructor of which accepts an
IOrganizationService instance and you pass in a global instance of the OrganizationServiceProxy
(i.e a static instance allocated once at the "process level"), will this cause threading issues and/or if the OrganizationServiceProxy instance faults, will it affect operations that the threads try to perform on their own "local" instance of the OrganizationServiceContext?
My belief is that it will, and that an OrganizationServiceProxy instance needs to be created on a "per thread" basis and that each OrganizationServiceContext in a multi threaded application would need its own corresponding OrganizationServiceProxy instance.
I'm posting this to get confirmation of the above.
Also, the article indicates
The service proxy class performs the metadata download and user authentication by using the following class methods
IServiceManagement<IOrganizationService> orgServiceManagement =
ServiceConfigurationFactory.CreateManagement<IOrganizationService>(
new Uri(organizationUrl))
AuthenticationCredentials authCredentials = orgServiceManagement.Authenticate(credentials)
By caching the service management and authenticated credential objects, your application can more efficiently construct the service proxy objects more than one time per application session
If I try to execute the above API calls manually, in Active directory authentication mode, the authCredentials.SecurityTokenResponse is null as indicated by MSDN
Is there a way to perform the authentication just once for AD mode and pass an authenticated SecurityTokenResponse to a newly created OrganizationServiceProxy via the following constructor?
OrganizationServiceProxy (IServiceConfiguration, SecurityTokenResponse)
so that you don't have to take the authentication and metadata download hit on a "per thread basis" when constructing the OrganizationServiceProxy instance per thread and just take the hit once?
Yes, you will definitely have issue if you attempt multi-threaded operations on a single IOrganization service.
We have two basic multi-threaded CRM applications: batch processors, and another web app. For the batch programs I've found it works better to only have 10 different threads, and to batch up the work among the 10 different threads. So if you're inserting 100,000 records, split them into 10 batches of 10,000, a single organization service for each thread.
We also have a website that does a lot of CRM interactions so there is no real way to batch the requests, so we created a CRM connection pool to reuse any open, already authenticated connections.
Of course this won't work at all if you're not using some system service account.