How to use CosmosClient.CreateAndInitializeAsync() with CosmosClientBuilder.Build() - azure

This YouTube video #27:20 talks about populating the cache with routing info to avoid latency during a cold start.
You can either try to get a document you know doesn't exist, or you can use CosmosClient.CreateAndInitializeAsync().
I already have this code set up:
private async Task<Container> CreateContainerAsync(string endpoint, string authKey)
{
var cosmosClientBuilder = new CosmosClientBuilder(
accountEndpoint: endpoint,
authKeyOrResourceToken: authKey)
.WithConnectionModeDirect(portReuseMode: PortReuseMode.PrivatePortPool, idleTcpConnectionTimeout: TimeSpan.FromHours(1))
.WithApplicationName(UserAgentSuffix)
.WithConsistencyLevel(ConsistencyLevel.Session)
.WithApplicationRegion(Regions.AustraliaEast)
.WithRequestTimeout(TimeSpan.FromSeconds(DatabaseRequestTimeoutInSeconds))
.WithThrottlingRetryOptions(TimeSpan.FromSeconds(DatabaseMaxRetryWaitTimeInSeconds), DatabaseMaxRetryAttemptsOnThrottledRequests);
var client = cosmosClientBuilder.Build();
var databaseResponse = await CreateDatabaseIfNotExistsAsync(client).ConfigureAwait(false);
var containerResponse = await CreateContainerIfNotExistsAsync(databaseResponse.Database).ConfigureAwait(false);
return containerResponse;
}
Is there any way to incorporate CosmosClient.CreateAndInitializeAsync() with it to populate the cache?
If not, is it ok to do this to populate the cache?
public class CosmosClientWrapper
{
public CosmosClientWrapper(IKeyVaultFacade keyVaultFacade)
{
var container = CreateContainerAsync(endpoint, authenticationKey).GetAwaiter().GetResult();
// Get a document that doesn't exist to populate the routing info:
container.ReadItemAsync<object>(Guid.NewGuid().ToString(), PartitionKey.None).GetAwaiter().GetResult();
}
}

The point of CreateAndInitialize or BuildAndInitialize is to pre-establish the connections required to perform Data Plane operations to the desired containers (reference https://learn.microsoft.com/azure/cosmos-db/nosql/sdk-connection-modes#routing).
If the containers do not exist, then it makes no sense to use CreateAndInitialize or BuildAndInitialize because there are no connections that can be pre-established/warmed up, because there are no target backend endpoints to connect to. That is why the container/database information is required, because the only benefit is warming up the connections to the backend machines that support that/those container/s.

Please see CosmosClientBuilder.BuildAndInitializeAsync which creates the cosmos client and initialize the provided containers. I believe this is what you are looking for.

Related

Azure Cosmos DB client throws "HttpRequestException: attempt was made to access a socket in a way forbidden by its access permissions" underneath

I use CosmosClient from SDK Microsoft.Azure.Cosmos 3.28.0 in ASP.NET Core 3.1 in Azure Durable Function. This client is getting and sending data from/to my cosmos instance (Core (SQL)) and it works fine but I see that it constantly throws exception in following http request for metadata
GET 169.254.169.254/metadata/instance
System.Net.Http.HttpRequestException: An attempt was made to access a socket in a way forbidden by its access permissions.
I use following configuration:
private static void RegisterCosmosDbClient(ContainerBuilder builder)
{
builder.Register(c => new SocketsHttpHandler()
{
PooledConnectionLifetime = TimeSpan.FromMinutes(10), // Customize this value based on desired DNS refresh timer
MaxConnectionsPerServer = 20, // Customize the maximum number of allowed connections
}).As<SocketsHttpHandler>().SingleInstance();
builder.Register(
x =>
{
var cosmosDbOptions = x.Resolve<CosmosDbOptions>();
var socketsHttpHandler = x.Resolve<SocketsHttpHandler>();
return new CosmosClient(cosmosDbOptions.ConnectionString, new CosmosClientOptions()
{
ConnectionMode = ConnectionMode.Direct,
PortReuseMode = PortReuseMode.PrivatePortPool,
IdleTcpConnectionTimeout = new TimeSpan(0, 23, 59, 59),
SerializerOptions = new CosmosSerializationOptions()
{
PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
},
HttpClientFactory = () => new HttpClient(socketsHttpHandler, disposeHandler: false)
});
})
.AsSelf()
.SingleInstance();
}
I also tried approach with passing IHttpClientFactory from this blog but it didn't help.
It looks like there are no new sockets available in your environment therefore you are getting the socket forbidden error. Please review how to manage connection for Azure Cosmos DB clients and you should use a singleton Azure Cosmos DB client for the lifetime of your application to resolve the issue. In case if you still facing the issue leveraging the singleton object then please let me know so we can further review it.
That particular IP and path is for https://learn.microsoft.com/azure/virtual-machines/windows/instance-metadata-service?tabs=windows
The SDK is attempting to detect the Azure information. It could mean for Durable Functions, this information and endpoint is not available.
This does not affect SDK operations and should not block you from performing other actions on the CosmosClient instance.

Mikro-orm inter-service transactions in NestJS

I am evaluating Mikro-Orm for a future project. There are several questions I either could not find an answer in the docs or did not fully understand them.
Let me describe a minimal complex example (NestJS): I have an order processing system with two entities: Orders and Invoices as well as a counter table for sequential invoice numbers (legal requirement). It's important to mention, that the OrderService create method is not always called by a controller, but also via crobjob/queue system. My questions is about the use case of creating a new order:
class OrderService {
async createNewOrder(orderDto) {
const order = new Order();
order.customer = orderDto.customer;
order.items = orderDto.items;
const invoice = await this.InvoiceService.createInvoice(orderDto.items);
order.invoice = invoice;
await order.persistAndFlush();
return order
}
}
class InvoiceService {
async create(items): Invoice {
const invoice = new Invoice();
invoice.number = await this.InvoiceNumberService.getNextInSequence();
// the next two lines are external apis, if they throw, the whole transaction should roll back
const pdf = await this.PdfCreator.createPdf(invoice);
const upload = await s3Api.uplpad(pdf);
return invoice;
}
}
class InvoiceNumberService {
async getNextInSequence(): number {
return await db.collection("counter").findOneAndUpdate({ type: "INVOICE" }, { $inc: { value: 1 } });
}
}
The whole use case of creating a new order with all subsequent service calls should happen in one Mikro-Orm transaction. So if anything throws in OrderService.createNewOrder() or one one of the subsequently called methods, the whole transaction should be rolled back.
Mikro-Orm does not allow the atomic update-increment shown in InvoiceNumberService. I can fall back to the native mongo driver. But how do I ensure the call to collection.findOneAndUpdate() shares the same transaction as the entities managed by Mikro-Orm?
Mikro-Orm needs a unique request context. In the examples for NestJS, this unique context is created at the controller level. In the example above the service methods are not necessarily called by a controller. So I would need a new context for each call to OrderService.createNewOrder() that has a lifetime scoped to the function call, correct? How can I acheive this?
How can I share the same request context between services? In the example above InvoiceService and InvoiceNumberService would need the same context as OrderService for Mikro-Orm to work properly.
I will start with the bad news, mongodb transactions are not yet supported in MikroORM (athough they will land within weeks probably, already got the PoC implemented). You can subscribe here for updates: https://github.com/mikro-orm/mikro-orm/issues/34
But let me answer the rest as it will then apply:
You can use const collection = (em as EntityManager<MongoDriver>).getConnection().getCollection('counter'); to get the collection from the internal mongo connection instance. You can also use orm.em.getTransactionContext() to get the current trasaction context (currently implemented only in sql drivers, but in future this will probably return the session object in mongo).
Also note that in mongo driver, implicit transactions won't be enabled by default (it will be configurable though), so you will need to use explicit transaction demarcation via em.transactional(...).
The RequestContext helper works automatically. You just register it as a middleware (done automatically in the nestjs orm adapter) and then your request handler (route/endpoint/controller method) is ran inside a domain that shares the context. Thanks to this, all services in the DI can share singleton instances of repositories, but they will automatically pick the right context from the domain.
You basically have this automatic request context, and then you can create new (nested) contexts manually via em.transactional(...).
https://mikro-orm.io/docs/transactions/#approach-2-explicitly

One connection per user

I know that this question was asked already, but it seems that some more things have to be clarified. :)
Database is designed in the way that each user has proper privileges to read documents, so the connection pool needs to have a connection with different users, which is out of connection pool concept. Because of the optimization and the performance I need to call so-called "user preparation" which includes setting session variables, calculating and caching values in a cache, etc, and after then execute queries.
For now, I have two solutions. In the first solution, I first check that everything is prepared for the user and then execute one or more queries. In case it is not prepared then I need to call "user preparation", and then execute query or queries. With this solution, I lose a lot of performance because every time I have to do the checking and so I've decided for another solution.
The second solution includes "database pool" where each pool is for one user. Only at the first connection useCount === 0 (I do not use {direct: true}) I call "user preparation" (it is stored procedure that sets some session variables and prepares cache) and then execute sql queries.
User preparation I’ve done in the connect event within the initOptions parameter for initializing the pgPromise. I used the pg-promise-demo so I do not need to explain the rest of the code.
The code for pgp initialization with the wrapper of database pooling looks like this:
import * as promise from "bluebird";
import pgPromise from "pg-promise";
import { IDatabase, IMain, IOptions } from "pg-promise";
import { IExtensions, ProductsRepository, UsersRepository, Session, getUserFromJWT } from "../db/repos";
import { dbConfig } from "../server/config";
// pg-promise initialization options:
export const initOptions: IOptions<IExtensions> = {
promiseLib: promise,
async connect(client: any, dc: any, useCount: number) {
if (useCount === 0) {
try {
await client.query(pgp.as.format("select prepareUser($1)", [getUserFromJWT(session.JWT)]));
} catch(error) {
console.error(error);
}
}
},
extend(obj: IExtensions, dc: any) {
obj.users = new UsersRepository(obj);
obj.products = new ProductsRepository(obj);
}
};
type DB = IDatabase<IExtensions>&IExtensions;
const pgp: IMain = pgPromise(initOptions);
class DBPool {
private pool = new Map();
public get = (ct: any): DB => {
const checkConfig = {...dbConfig, ...ct};
const {host, port, database, user} = checkConfig;
const dbKey = JSON.stringify({host, port, database, user})
let db: DB = this.pool.get(dbKey) as DB;
if (!db) {
// const pgp: IMain = pgPromise(initOptions);
db = pgp(checkConfig) as DB;
this.pool.set(dbKey, db);
}
return db;
}
}
export const dbPool = new DBPool();
import diagnostics = require("./diagnostics");
diagnostics.init(initOptions);
And web api looks like:
GET("/api/getuser/:id", (req: Request) => {
const user = getUserFromJWT(session.JWT);
const db = dbPool.get({ user });
return db.users.findById(req.params.id);
});
I'm interested in whether the source code correctly instantiates pgp or should be instantiated within the if block inside get method (the line is commented)?
I've seen that pg-promise uses DatabasePool singleton exported from dbPool.js which is similar to my DBPool class, but with the purpose of giving “WARNING: Creating a duplicate database object for the same connection”. Is it possible to use DatabasePool singleton instead of my dbPool singleton?
It seems to me that dbContext (the second parameter in pgp initialization) can solve my problem, but only if it could be forwarded as a function, not as a value or object. Am I wrong or can dbContext be dynamic when accessing a database object?
I wonder if there is a third (better) solution? Or any other suggestion.
If you are troubled by this warning:
WARNING: Creating a duplicate database object for the same connection
but your intent is to maintain a separate pool per user, you can indicate so by providing any unique parameter for the connection. For example, you can include custom property with the user name:
const cn = {
database: 'my-db',
port: 12345,
user: 'my-login-user',
password: 'my-login-password'
....
my_dynamic_user: 'john-doe'
}
This will be enough for the library to see that there is something unique in your connection, which doesn't match the other connections, and so it won't produce that warning.
This will work for connection strings as well.
Please note that what you are trying to achieve can only work well when the total number of connections well exceeds the number of users. For example, if you can use up to 100 connections, with up to 10 users. Then you can allocate 10 pools, each with up to 10 connections in it. Otherwise, scalability of your system will suffer, as total number of connections is a very limited resource, you would typically never go beyond 100 connections, as it creates excessive load on the CPU running so many physical connections concurrently. That's why sharing a single connection pool scales much better.

How do I access Cosmos DB database or collection metrics via the Azure REST API?

I am attempting to use the Azure metrics API to retrieve metrics for Cosmos DB databases and collections.
I am able to use the metrics API to retrieve metrics for the Cosmos DB account itself, but I cannot figure out the resource URL for the databases or collections.
So this works:
public static async Task GetMetricsForCollection(ICosmosDBAccount cosmos, IDocumentClient client)
{
var uriBuilder = new System.Text.StringBuilder();
uriBuilder.Append($"https://management.azure.com{cosmos.Id}");
uriBuilder.Append($"/providers/microsoft.insights/metricDefinitions?api-version=2018-01-01");
//...Use uri to access API over HTTP
But I can't figure out how to get more specific metrics at deeper levels.
I found this post on the MSDN Community that says that this should work
public static async Task GetMetricsForCollection(ICosmosDBAccount cosmos, IDocumentClient client)
{
var db = client.CreateDatabaseQuery().AsEnumerable().First();
var uriBuilder = new System.Text.StringBuilder();
//Use the database resource Id to retrieve the metrics
uriBuilder.Append($"https://management.azure.com{cosmos.Id}/databases/{db.ResourceId}");
uriBuilder.Append($"/providers/microsoft.insights/metricDefinitions?api-version=2018-01-01");
//...Use uri to access API over HTTP
But it returns an error
Response status code does not indicate success: 400
Microsoft.DocumentDB/databaseAccounts/databases is not a supported platform metric namespace, supported ones are
Microsoft.LocationBasedServices/accounts,Microsoft.EventHub/namespaces,Microsoft.EventHub/clusters,Microsoft.ServiceBus/namespaces,
Microsoft.KeyVault/vaults,Microsoft.ClassicCompute/domainNames/slots/roles,Microsoft.ClassicCompute/virtualMachines,
Microsoft.Network/publicIPAddresses,Microsoft.Network/networkInterfaces,Microsoft.Network/loadBalancers,
Microsoft.Network/networkWatchers/connectionMonitors,Microsoft.Network/virtualNetworkGateways,Microsoft.Network/connections,
Microsoft.Network/applicationGateways,Microsoft.Network/dnszones,Microsoft.Network/trafficmanagerprofiles,
Microsoft.Network/expressRouteCircuits,Microsoft.EventGrid/eventSubscriptions,Microsoft.EventGrid/topics,Microsoft.EventGrid/extensionTopics,
Microsoft.Batch/batchAccounts,Microsoft.TimeSeriesInsights/environments,Microsoft.TimeSeriesInsights/environments/eventsources,
Microsoft.OperationalInsights/workspaces,Microsoft.Maps/accounts,Microsoft.Sql/servers,Microsoft.Sql/servers/databases,
Microsoft.Sql/servers/elasticpools,Microsoft.AnalysisServices/servers,Microsoft.Compute/virtualMachines,
Microsoft.Compute/virtualMachineScaleSets,Microsoft.Compute/virtualMachineScaleSets/virtualMachines,Microsoft.DataFactory/dataFactories,
Microsoft.DataFactory/factories,Microsoft.Storage/storageAccounts,Microsoft.Storage/storageAccounts/blobServices,
Microsoft.Storage/storageAccounts/tableServices,Microsoft.Storage/storageAccounts/queueServices,
Microsoft.Storage/storageAccounts/fileServices,Microsoft.Logic/workflows,Microsoft.Automation/automationAccounts,
Microsoft.ContainerService/managedClusters,Microsoft.StorageSync/storageSyncServices,Microsoft.ApiManagement/service,
Microsoft.DBforMySQL/servers,Microsoft.DocumentDB/databaseAccounts,Microsoft.ContainerRegistry/registries,Microsoft.Search/searchServices,
microsoft.insights/components,microsoft.insights/autoscalesettings,Microsoft.DataLakeStore/accounts,Microsoft.Web/serverFarms,
Microsoft.Web/sites,Microsoft.Web/sites/slots,Microsoft.Web/hostingEnvironments/multiRolePools,Microsoft.Web/hostingEnvironments/workerPools,
Microsoft.HDInsight/clusters,Microsoft.NetApp/netAppAccounts/capacityPools,Microsoft.NetApp/netAppAccounts/capacityPools/volumes,
test.shoebox/testresources,test.shoebox/testresources2,Microsoft.NotificationHubs/namespaces/notificationHubs,Microsoft.CustomerInsights/hubs,
CloudSimple.PrivateCloudIaaS/virtualMachines,Microsoft.StreamAnalytics/streamingjobs,Microsoft.CognitiveServices/accounts,
Microsoft.Cache/Redis,Microsoft.Devices/IotHubs,Microsoft.Devices/ElasticPools,Microsoft.Devices/ElasticPools/IotHubTenants,
Microsoft.Devices/ProvisioningServices,Microsoft.SignalRService/SignalR,Microsoft.DataLakeAnalytics/accounts,
Microsoft.DBforPostgreSQL/servers,Microsoft.ContainerInstance/containerGroups,Microsoft.Relay/namespaces,
Microsoft.PowerBIDedicated/capacities
(So you don't have to read all of that, I can confirm that it doesn't mention collections or databases as being usable with this API.)
I've also tried it with db.Id instead of db.ResourceId with the same error.
I've also tried going to the collection with uriBuilder.Append($"https://management.azure.com{cosmos.Id}/databases/{db.ResourceId}/collections/{collection.ResourceId}"); but it also generates the same message.
I'm stumped.
After poking around a bit with network traces while exploring in the Azure Portal, it looks like there are two types of Cosmos metrics: those that use the microsoft.insights provider and those that don't.
For those that use the provider, you can add the database and collection (the human readable names, aka the .Id property) as filters to the metrics API:
public static async Task GetMetricsForCollection(ICosmosDBAccount cosmos, IDocumentClient client)
{
var db = client.CreateDatabaseQuery().AsEnumerable().First();
var dbUri = UriFactory.CreateDatabaseUri(db.Id);
var collection = client.CreateDocumentCollectionQuery(dbUri).AsEnumerable().First();
var uriBuilder = new System.Text.StringBuilder();
//Use the database resource Id to retrieve the metrics
uriBuilder.Append($"https://management.azure.com{cosmos.Id}");
uriBuilder.Append($"/providers/microsoft.insights/metricDefinitions?api-version=2018-01-01");
uriBuilder.Append($"&$filter=DatabaseName eq '{db.Id}' and CollectionName eq '{collection.Id}'");
For those that don't, you can add /metrics to the resource URI suggested by the linked forum post. In this case, you need to use the .ResourceId properties.
It also looks like a filter parameter is required. I'm just copying and pasting from the Azure portal's network traces as I don't believe this is documented anywhere but it ends up looking something like
public static async Task GetMetricsForCollection(ICosmosDBAccount cosmos, IDocumentClient client)
{
var db = client.CreateDatabaseQuery().AsEnumerable().First();
var dbUri = UriFactory.CreateDatabaseUri(db.Id);
var collection = client.CreateDocumentCollectionQuery(dbUri).AsEnumerable().First();
var uriBuilder = new System.Text.StringBuilder();
uriBuilder.Append($"https://management.azure.com{cosmos.Id}");
uriBuilder.Append($"/databases/{db.ResourceId}/collections/{collection.ResourceId}/metrics?api-version=2014-04-01");
uriBuilder.Append($"&$filter=(name.value eq 'Available Storage' or name.value eq 'Data Size' or name.value eq 'Index Size') and endTime eq 2018-06-22T12%3A35%3A00.000Z and startTime eq 2018-06-22T11%3A35%3A00.000Z and timeGrain eq duration'PT5M'");

Convert QueueClient.Create to MessagingFactory.CreateQueueClient

Trying to convert an implementation using the .net library from using QueueClient.Create to the MessagingFactory.CreateQueueClient to be able to better control the BatchFlushInterval as well as to allowing the use of multiple factories over multiple connections to increase send throughput but running into roadblocks.
Right now we are creating QueueClients (they are maintained throughout the app) like this:
QueueClient.CreateFromConnectionString(address, queueName, ReceiveMode.PeekLock); // address is the connection string from the azure portal in the form of Endpoint=sb....
Trying to change it to creating a MessagingFactory in the class construtor that will be used to create the QueueClients:
messagingFactory = MessagingFactory.Create(address.Replace("Endpoint=",""),mfs);
// later on in another part of the class
messagingFactory.CreateQueueClient(queueName, ReceiveMode.PeekLock);
// error Endpoint not found.,
This throws the error Endpoint not found. If I don't replace the Endpoint= it won't even create the MessagingFactory. What is the proper way to handle this?
Notes:
address = Endpoint=sb://pmg-bus-mybus.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=somekey
As an aside, we have a process that is trying to push as many messages as possible to a queue and others reading it. The readers seem to easily keep up with the sender and I'm trying to maximize the send rate.
The address is the base address of namespace(sb://yournamespace.servicebus.windows.net/) you are connecting to. For more information, please refer to MessagingFactory. The following is the demo code :
var Address = "sb://yournamespace.servicebus.windows.net/"; //base address of namespace you are connecting to.
MessagingFactorySettings MsgFactorySettings = new MessagingFactorySettings
{
NetMessagingTransportSettings = new NetMessagingTransportSettings
{
BatchFlushInterval = TimeSpan.FromSeconds(2)
},
TokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider("RootManageSharedAccessKey", "balabala..."),
OperationTimeout = TimeSpan.FromSeconds(30)
}; //specify operating timeout (optional)
MessagingFactory messagingFactory = MessagingFactory.Create(Address, MsgFactorySettings);
var queue = messagingFactory.CreateQueueClient("queueName",ReceiveMode.PeekLock);
var message = queue.Receive(TimeSpan.Zero);

Resources