getState() on IBM Blockchain v2: experiencing long load times - hyperledger-fabric

I have deployed a single organization network on IBM Blockchain v2. I am experiencing very slow load times (always over 3 seconds for a single asset.)
I have bumped up specs on the Kubernetes cluster. I have also adjusted some of the resource allocations, but the load time didn't budge.
async query(ctx, key) {
console.info('query by key ' + key);
let returnAsBytes = await ctx.stub.getState(key);
console.info(returnAsBytes)
if (!returnAsBytes || returnAsBytes.length === 0) {
return new Error(`${key} does not exist`);
}
let result = JSON.parse(returnAsBytes);
console.info('result of getState: ');
console.info(result);
return JSON.stringify(result);
}
I am wondering if there is a way to get faster results. I also haven't found been able to find many resources on proper deployments for IBM Blockchain v2, so I'm unsure if I'm doing something incorrectly.

Unfortunately you haven't provided enough information, however one area which can have a performance impact is on the client side application where an incorrect pattern for every request used would be
create gateway/connect gateway/submit transaction or evaluate transaction/disconnect gateway.
This jira https://jira.hyperledger.org/projects/FABN/issues/FABN-1319 provides some detail about the gateway lifecycle. But a quick one line suggestion is, don't create gateways all the time, cache them and use a stale policy to disconnect them after a period of non-use. Note gateways are bound to an identity so you would have a gateway for each identity

Related

Google Pub/Sub with distributed subscribers in Node.js

We are attempting to migrate a message processing app from Kafka to Google Pub/Sub and it's just not working as expected.
We are running in Kubernetes (Google Cloud) where there may be multiple pods processing messages on the same subscription. Topics and subscriptions are all created using terraform and are more or less permanent. They are not created/destroyed on the fly by the application.
In our development environment, where message throughput is rather low, everything works just fine. But when we scale up to production levels, everything seems to fall apart. We get big backlogs of unacked messages, and yet some pods are not receiving any messages at all. And then, all of a sudden, the backlog will just go away, but then climb again.
We are using the nodejs client library provided by google: #google-cloud/pubsub:3.1.0
Each instance of the application subscribes to the same named subscription, and according to the documentation, messages should be distributed to each subscriber. But that is not happening. Some pods will be consuming messages rapidly, while others sit idle.
Every message is processed in a try/catch block and we are not observing any errors being thrown. So, as far as we know, every received message is getting acked.
I am suspicious that, as pods are terminated with autoscaling or updated deployments, that we are not properly closing subscriptions, but there are no examples addressing a distributed environment and I have not found any document that specifically addresses how to properly manage resources. It is also worth mentioning that the app has multiple subscriptions to different topics.
When a pod shuts down, what actions should be taken on the Subscription object and the PubSub client object? Maybe that's not even the issue, but it seems like a reasonable place to start.
When we start a subscription we do something like this:
private exampleSubscribe(): Subscription {
// one suggestion for having multiple subscriptions in the same app
// was to use separate clients for each
const pubSubClient = new PubSub({
// use a regional endpoint for message ordering
apiEndpoint: 'us-central1-pubsub.googleapis.com:443',
});
pubSubClient.projectId = 'my-project-id';
const sub = pubSubClient.subscription('my-subscription-name', {
// have tried various values for maxMessage from 5 to the default of 1000
flowControl: { maxMessages: 250, allowExcessMessages: false },
ackDeadline: 30,
});
sub.on('message', async (message) => {
await this.exampleMessageProcessing(message);
});
return sub;
}
private async exampleMessageProcessing(message: Message): Promise<void> {
try {
// do some cool stuff
} catch (error) {
// log the error
} finally {
message.ack();
}
}
Upon termination of a pod, we do this:
private async exampleCloseSub(sub: Subscription) {
try {
sub.removeAllListeners('message');
await sub.close();
// note that we do nothing with the PubSub
// client object -- should it also be closed?
} catch (error) {
// ignore error, we are shutting down
}
}
When running with Kafka, we can easily keep up with the message pace with usually no more than 2 pods. So I know that we are not running into issues of it simply taking too long to process each message.
Why are messages being left unacked? Why are pods not receiving messages when there is clearly a large backlog? What is the correct way to shut down one subscriber on a shared subscription?
It turns out that the issue was an improper implementation of message ordering.
The official docs for message ordering in Pub/Sub are rather brief:
https://cloud.google.com/pubsub/docs/ordering
Not much there regarding how to implement an ordering key or the implications of message ordering on horizontal scaling.
Though they do link to some external resources, one of which is this blog post:
https://medium.com/google-cloud/google-cloud-pub-sub-ordered-delivery-1e4181f60bc8
In our case, we did not have enough distinct ordering keys to allow for proper distribution of messages across subscribers/pods.
So this was definitely an RTFM situation, or more accurately: Read The Fine Blog Post Referred To By The Manual. I would have much preferred that the important details were actually in the official documentation. Is that to much to ask for?

Random 21/42 seconds timeout in outgoing traffic on Azure Web Sites

I have an ASP.NET MVC 5 application running in the azure german cloud as Azure Web App (single instance - Standard S3 size).
I'm calling a non azure hosted REST/SOAP service on a particular host and the web requests either succeed promptly or timeout after 21 / 42 seconds.
I've load tested the requests and the percentile of requests timing out is between 20 and 80.
One particular remarkable property of the timeout is, that they occur after exactly 21 or 42 seconds (this is serious, no reference to hitchhiker's guide to the galaxy intended).
Calling a different service from the web app works just fine, temporarily at least.
We've already checked the firewall of the non azure service and if the timeout occurs, not a single packet reached the host.
This issue occurred once in the past one year ago and support was unable to tell what the cause was until the issue suddenly went away roughly two weeks after first occuring, so the ticket got closed as fixed itself but now its back.
The code is using https://github.com/canton7/RestEase (uses HttpClient underneath) and looks like
[Header("Content-Type", "application/json")]
public interface IApi
{
[Post("/Login")]
Task<LoginToken> Login([Body]LoginRequest request);
}
private static Dictionary<string, IApi> ApiClientsByHost = new Dictionary<string, IApi>();
private IApi GetApiForHost(string host)
{
if (!ApiClientsByHost.TryGetValue(host, out var client))
{
lock (ApiClientsByHost)
{
if (!ApiClientsByHost.TryGetValue(host, out client))
{
ApiClientsByHost[host] = client = RestClient.For<IApi>(host);
}
}
}
return client;
}
var client = GetApiForHost("https://production/");
var loginToken = await client.Login(new LoginRequest { Username = username, Password = password });
By different service, i mean using "https://testserver/" instead of "https://production/" (testserver is located in a different data center with different IP and all).
The API authentication is passing a token via query but it timeouts already before being able to get a token.
The code is caching the IApi to avoid the TCP starvation problems of disposing HttpClients (but i've never run into port exhaustion).
Restarting the app does not resolve the issue and the issue only occurs to production currently (but a year ago, when this issue occurred on production, we've switched to testserver which worked initially but after some time, ran into the same problem)
EDIT: Found some explanation in the last answer as to where those magical 21 seconds are comming from.
EDIT: One way i've found to workaround is, is to setup a azure vm with a proxy on it and configure defaultProxy to pass through that vm.
That's TCP retransmission timing out. It's odd that you are getting different values though.

Azure Socket Leaks?

I have an ASP.NET Core a website with a lot of simultaneous users which crashes many times during the day and I scaled up and out but no luck.
I have been told my numerous Azure support staff that the issue is that I'm sending out a lot of database calls although database utilization improved after creating indexes. Can you kindly advise what you think the problem is as I have done my best...
I was told that I have "socket leaks".
Please note:
I don't have any external service calls except to sendgrid
I have not used ConfigureAwait(false)
I'm not using "using" statements or explicitly disposing contexts
This is my connection string If it may help...
Server=tcp:sarahah.database.windows.net,1433;Initial Catalog=SarahahDb;Persist Security Info=False;User ID=********;Password=******;MultipleActiveResultSets=True;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;Max Pool Size=400;
These are some code examples:
In Startup.CS:
services.AddDbContext<ApplicationDbContext>(options =>
options.UseSqlServer(Configuration.GetConnectionString("DefaultConnection")));
Main class:
private readonly ApplicationDbContext _context;
public MessagesController(ApplicationDbContext context, IEmailSender emailSender, UserManager<ApplicationUser> userManager)
{
_context = context;
_emailSender = emailSender;
_userManager = userManager;
}
This an important method code for example:
string UserId = _userManager.GetUserId(User);
var user = await _context.Users.Where(u => u.Id.Equals(UserId)).Include(u => u.Messages).FirstOrDefaultAsync();
// some other code
return View(user.Messages);
Please advise as I have tried my best but this is very embarrassing to me in font of my customers.
Without the error messages that you're seeing, here's a few ideas that you can check.
I'd start with going to your Web App's Overview blade in the Azure Portal. Update the monitoring graph to a time period when you're experiencing problems. Are you CPU bound? Have you exhausted memory? Also, check the HTTP Queue length. If your HTTP queue is really long, it's because your server is choking trying to service the requests and users are experiencing timeout issues.
Next, jump over to your SQL Server's Overview blade in the Azure Portal, and look at the resource utilization chart. Set the time period on the chart to when you're experiencing problems. Have you pegged out your DTUs for your database? If so, it's a sign of poor indexing, poor schema design, or you're just undersized and need to scale up.
Turn on ApplicationInsights if you haven't already. You can use the ApplicationInsights API to insert your own trace statements into your code. Or, you might be able to see exceptions causing the issue without having to do your own tracing.
Check the Kudu logs for your Web Apps.
I agree with Tseng - your usage of EF and .NET Core's DI framework looks correct.
Let us know how the troubleshooting goes and provide additional information on exactly what kind of errors you're seeing. Best of luck!
It looks like a DI issue to me. You are injecting ApplicationDbContext context. Which means the ApplicationDbContext will be resolved from the DI container meaning it will stay open the entire request (transient) as Tseng pointed out. It should be a scoped.
You can inject IServiceScopeFactory scopeFactory in your controller and do something like:
using (var scope = _scopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<ApplicationDbContext>();
}
Note that if you are using ASP.NET Core 1.1 and want to be sure that all your services are being resolved correctly change your ConfigureService method in the Startup to:
public IServiceProvider ConfigureServices(IServiceCollection services)
{
// Register services
return services.BuildServiceProvider(validateScopes: true);
}

Limiting the number of concurrent jobs on Azure Functions queue

I have a Function app in Azure that is triggered when an item is put on a queue. It looks something like this (greatly simplified):
public static async Task Run(string myQueueItem, TraceWriter log)
{
using (var client = new HttpClient())
{
client.BaseAddress = new Uri(Config.APIUri);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
ApiResponse apiResponse = JsonConvert.DeserializeObject<ApiResponse>(json);
log.Info($"Activity data successfully sent to platform in {apiResponse.elapsed}ms. Tracking number: {apiResponse.tracking}");
}
}
This all works great and runs pretty well. Every time an item is put on the queue, we send the data to some API on our side and log the response. Cool.
The problem happens when there's a big spike in "the thing that generates queue messages" and a lot of items are put on the queue at once. This tends to happen around 1,000 - 1,500 items in a minute. The error log will have something like this:
2017-02-14T01:45:31.692 mscorlib: Exception while executing function:
Functions.SendToLimeade. f-SendToLimeade__-1078179529: An error
occurred while sending the request. System: Unable to connect to the
remote server. System: Only one usage of each socket address
(protocol/network address/port) is normally permitted
123.123.123.123:443.
At first, I thought this was an issue with the Azure Function app running out of local sockets, as illustrated here. However, then I noticed the IP address. The IP address 123.123.123.123 (of course changed for this example) is our IP address, the one that the HttpClient is posting to. So, now I'm wondering if it is our servers running out of sockets to handle these requests.
Either way, we have a scaling issue going on here. I'm trying to figure out the best way to solve it.
Some ideas:
If it's a local socket limitation, the article above has an example of increasing the local port range using Req.ServicePoint.BindIPEndPointDelegate. This seems promising, but what do you do when you truly need to scale? I don't want this problem coming back in 2 years.
If it's a remote limitation, it looks like I can control how many messages the Functions runtime will process at once. There's an interesting article here that says you can set serviceBus.maxConcurrentCalls to 1 and only a single message will be processed at once. Maybe I could set this to a relatively low number. Now, at some point our queue will be filling up faster than we can process them, but at that point the answer is adding more servers on our end.
Multiple Azure Functions apps? What happens if I have more than one Azure Functions app and they all trigger on the same queue? Is Azure smart enough to divvy up the work among the Function apps and I could have an army of machines processing my queue, which could be scaled up or down as needed?
I've also come across keep-alives. It seems to me if I could somehow keep my socket open as queue messages were flooding in, it could perhaps help greatly. Is this possible, and any tips on how I'd go about doing this?
Any insight on a recommended (scalable!) design for this sort of system would be greatly appreciated!
I think the code error is because of: using (var client = new HttpClient())
Quoted from Improper instantiation antipattern:
this technique is not scalable. A new HttpClient object is created for
each user request. Under heavy load, the web server may exhaust the
number of available sockets.
I think I've figured out a solution for this. I've been running these changes for the past 3 hours 6 hours, and I've had zero socket errors. Before I would get these errors in large batches every 30 minutes or so.
First, I added a new class to manage the HttpClient.
public static class Connection
{
public static HttpClient Client { get; private set; }
static Connection()
{
Client = new HttpClient();
Client.BaseAddress = new Uri(Config.APIUri);
Client.DefaultRequestHeaders.Add("Connection", "Keep-Alive");
Client.DefaultRequestHeaders.Add("Keep-Alive", "timeout=600");
Client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
}
Now, we have a static instance of HttpClient that we use for every call to the function. From my research, keeping HttpClient instances around for as long as possible is highly recommended, everything is thread safe, and HttpClient will queue up requests and optimize requests to the same host. Notice I also set the Keep-Alive headers (I think this is the default, but I figured I'll be implicit).
In my function, I just grab the static HttpClient instance like:
var client = Connection.Client;
StringContent httpContent = new StringContent(myQueueItem, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("/api/devices/data", httpContent);
response.EnsureSuccessStatusCode();
I haven't really done any in-depth analysis of what's happening at the socket level (I'll have to ask our IT guys if they're able to see this traffic on the load balancer), but I'm hoping it just keeps a single socket open to our server and makes a bunch of HTTP calls as the queue items are processed. Anyway, whatever it's doing seems to be working. Maybe someone has some thoughts on how to improve.
If you use consumption plan instead of Functions on a dedicated web app, #3 more or less occurs out of the box. Functions will detect that you have a large queue of messages and will add instances until queue length stabilizes.
maxConcurrentCalls only applies per instance, allowing you to limit per-instance concurrency. Basically, your processing rate is maxConcurrentCalls * instanceCount.
The only way to control global throughput would be to use Functions on dedicated web apps of the size you choose. Each app will poll the queue and grab work as necessary.
The best scaling solution would improve the load balancing on 123.123.123.123 so that it can handle any number of requests from Functions scaling up/down to meet queue pressure.
Keep alive afaik is useful for persistent connections, but function executions aren't viewed as a persistent connection. In the future we are trying to add 'bring your own binding' to Functions, which would allow you to implement connection pooling if you liked.
I know the question was answered long ago, but in the mean time Microsoft have documented the anti-pattern that you were using.
Improper Instantiation antipattern

How to manage centralized values in a sharded environment

I have an ASP.NET app being developed for Windows Azure. It's been deemed necessary that we use sharding for the DB to improve write times since the app is very write heavy but the data is easily isolated. However, I need to keep track of a few central variables across all instances, and I'm not sure the best place to store that info. What are my options?
Requirements:
Must be durable, can survive instance reboots
Must be synchronized. It's incredibly important to avoid conflicting updates or at least throw an exception in such cases, rather than overwriting values or failing silently.
Must be reasonably fast (2000+ read/writes per second
I thought about writing a separate component to run on a worker role that simply reads/writes the values in memory and flushes them to disk every so often, but I figure there's got to be something already written for that purpose that I can appropriate in Windows Azure.
I think what I'm looking for is a system like Apache ZooKeeper, but I dont' want to have to deal with installing the JRE during the worker role startup and all that jazz.
Edit: Based on the suggestion below, I'm trying to use Azure Table Storage using the following code:
var context = table.ServiceClient.GetTableServiceContext();
var item = context.CreateQuery<OfferDataItemTableEntity>(table.Name)
.Where(x => x.PartitionKey == Name).FirstOrDefault();
if (item == null)
{
item = new OfferDataItemTableEntity(Name);
context.AddObject(table.Name, item);
}
if (item.Allocated < Quantity)
{
allocated = ++item.Allocated;
context.UpdateObject(item);
context.SaveChanges();
return true;
}
However, the context.UpdateObject(item) call fails with The context is not currently tracking the entity. Doesn't querying the context for the item initially add it to the context tracking mechanism?
Have you looked into SQL Azure Federations? It seems like exactly what you're looking for:
sharding for SQL Azure.
Here are a few links to read:
http://msdn.microsoft.com/en-us/library/windowsazure/hh597452.aspx
http://convective.wordpress.com/2012/03/05/introduction-to-sql-azure-federations/
http://searchcloudapplications.techtarget.com/tip/Tips-for-deploying-SQL-Azure-Federations
What you need is Table Storage since it matches all your requirements:
Durable: Yes, Table Storage is part of a Storage Account, which isn't related to a specific Cloud Service or instance.
Synchronized: Yes, Table Storage is part of a Storage Account, which isn't related to a specific Cloud Service or instance.
It's incredibly important to avoid conflicting updates: Yes, this is possible with the use of ETags
Reasonably fast? Very fast, up to 20,000 entities/messages/blobs per second
Update:
Here is some sample code that uses the new storage SDK (2.0):
var storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
var table = storageAccount.CreateCloudTableClient()
.GetTableReference("Records");
table.CreateIfNotExists();
// Add item.
table.Execute(TableOperation.Insert(new MyEntity() { PartitionKey = "", RowKey ="123456", Customer = "Sandrino" }));
var user1record = table.Execute(TableOperation.Retrieve<MyEntity>("", "123456")).Result as MyEntity;
var user2record = table.Execute(TableOperation.Retrieve<MyEntity>("", "123456")).Result as MyEntity;
user1record.Customer = "Steve";
table.Execute(TableOperation.Replace(user1record));
user2record.Customer = "John";
table.Execute(TableOperation.Replace(user2record));
First it adds the item 123456.
Then I'm simulating 2 users getting that same record (imagine they both opened a page displaying the record).
User 1 is fast and updates the item. This works.
User 2 still had the window open. This means he's working on an old version of the item. He updates the old item and tries to save it. This causes the following exception (this is possible because the SDK matches the ETag):
The remote server returned an error: (412) Precondition Failed.
I ended up with a hybrid cache / table storage solution. All instances track the variable via Azure caching, while the first instance spins up a timer that saves the value to table storage once per second. On startup, the cache variable is initialized with the value saved to table storage, if available.

Resources