1000 rows limit for chef-api module/wrapper

1000 rows limit for chef-api module/wrapper - node.js

So im using this Node module to connect to chef from my API.
https://github.com/normanjoyner/chef-api
The same contains a method called "partialSearch" which will fetch determined data for all nodes that match a given criteria. The problem I have, on of our environments have 1386 nodes attached to it, but it seems the module only returns 1000 as a maximum.
There does not seem to be any method to "offset" the results. This module works pretty well and its a shame this feature is not implemented since its lack really breaks the utility of such.
Does someone bumped into a similar issue with this module and can advise how to workaround it?
Here its an extract of my code :
chef.config(SetOptions(environment));
console.log("About to search for any servers ...");
chef.partialSearch('node',
{
q: "name:*"
},
{
name: ['name'] ,
'ipaddress': ['ipaddress'] ,
'chef_environment': ['chef_environment'] ,
'ip6address': ['ip6address'],
'run_list': ['run_list'],
'chef_client': ['chef_client'],
'ohai_time': ['ohai_time']
}
, function(err, chefRes) {
Regards!

The maximum is 1000 results per page, you can still request pages in order. The Chef API doesn't have a formal cursor system for pagination so it's just separate requests with a different start value, which can sometimes lead to minor desync (as in an item at the end of one page might shift in ordering and also show up at the start of the next page) so just make sure you handle that. That said, the fancy API in the client library you linked doesn't seem to expose that option, so you'll have to add it or otherwise workaround the problem. Check out https://github.com/sethvargo/chef-api/blob/master/lib/chef-api/resources/partial_search.rb#L34 for a Ruby implementation that does handle it.

We have run into similar issues with Chef libraries. One work-around you might find useful is if you have some node attribute that you can use to segment all of your nodes into smaller groups that are less than 1000.
If you have no such natural segmentation friendly already, a simple implementation would be to create a new attribute called segment and during your chef runs set the attribute's value randomly to a number between 1 and 5.
Now you can perform 5 queries (each query will only search for a single segment) and you should find all your nodes and if the randomness is working each group will be sized about 275 (1386/5).
As your node population grows you'll need to keep increasing the number of segments to ensure the segment sizes are less than 1000.

Related

How do I calculate TVL on-chain?

I am working on a anchor / solana program that provides liquidity to a number of pools, including saber.so and invariant.app. During the swap, I need to calculate the TVL, to provision a token at a fair exchange rate.
My question is: what is the best way to calculate TVL onchain?
The following are some approaches that I have in mind, each one with its shortcomings:
(1) Calculate off-chain, and provide this as an oracle:
We could calculate the TVL off-chain, and then provide this TVL as an oracle. The shortcomings are: chainlink (an oracle provider) on solana does not seem to support custom data-feeds, as is the case with ethereum.
Further more this solution increases the centralization of the app, it would be nice to have it on-chain. also there could be oracle-attacks which drain the reserves of the protocol.
(2) Have a giant list of liquidity-positions:
Another approach would be to keep track of all liquidity-positions that we as a protocol have provided liquidity in. Although this is possible, I believe that this would (very quickly), reach solana's account limit.
In this case, we would have a huge "state-"account, which tracks the following variables per pool:
token1_mint: Pubkey
token2_mint: Pubkey
token1_amount: u64
token2_amount: u64
token1_to_currency_pyth_feed_address: Pubkey
token2_to_currency_pyth_feed_address: Pubkey
provider: u8
Given that we have 4 * 32 + 2 * 64 bytes + 8 bytes = 264 bytes, we can have around 20 pools that we can deposit at any given point in time (because of a 4KB account limit on solana)
The second option seems like the way to go, as the first one if off-chain and prone to oracle-attacks. However, the second option still seems a bit hacky, as I would have to include this data-structure anytime I intend to calculate the total TVL.
Is there any other design ideas that come to your mind or that you have seen, that would be appropriate?

I don't know much about the overall design of your program to provide you with a good solution. I also don't know what invariant is, maybe that breaks what I'm about to describe below.
I assume that you have some instruction in your program which cpi calls into Saber etc and opens a liquidity position. Assuming that that instruction creates an account on the chain with the following information:
pool_address,
token_1_mint_address
token_2_mint_address
amount_token_1
amount_token_2
...
One simple solution is to loop through all those accounts, and since you have the amount and mint of each token, you can calculate the value using something like the pyth price oracle. I wouldn't do this on chain though since it can become pretty expensive fast! Perhaps is best to do it on the client side and write this information back to the chain.
The recent solana bootcamp videos actually have a tutorial on bringing off chain info back to the chain! https://www.youtube.com/watch?v=GwhRWde3Ckw&t=385s
Below is a demo of the runtime limitations of on chain programs, perhaps you could do the loop through the pool accounts, if you use some indexing and PDA to find the account address and assuming that you have a limited number of liquidity positions! However I wouldn't hardcode all the information into a single account, seems like an unsustainable approach that might cause a lot of issues down the line. Might give you superior performance however, not sure.
https://www.youtube.com/watch?v=5IrfSecDPeA&t=1191s
Anyways GL!

KRPC Protocol act weird in BEP-05

According to BEP-05 , when you start a find_node or get_peers request, you will receive the query message or K (8) good nodes closest to the target/infohash.
However, in my case ,with the bootstrap node router.utorrent.com:6881, the remote returned the 8 nodes which closest to self's nodeId. And if it is a get_peers request, it always returned 8 nodes closest to self and 7 invalid peers. But if access to some special node which redirect to near the infohash, the protocol acts normal.
weird wireshark dump
success wireshark dump
Any help would be appreciated!

You shouldn't pay too much attention to what the bootstrap nodes do as long as they allow you to populate your routing table, since that is their primary purpose.
They receive a disproportionate amount of traffic and to avoid directing undue amounts of traffic to any particular node they may deviate from the specification in a few ways that are harmless as long as only a vanishingly small fraction of the network behaves that way. There is only a single-digit number of bootstrap nodes among millions, so their behavior is negligible and should not be taken as a reference point.
It does not make sense to contact a bootstrap node via get peers either. find node queries would be the correct choice to populate your routing table. And it is only necessary to contact them in the relatively rare case where other mechanisms were not successful.

How can I get issues since any number?

var xhr = new XMLHttpRequest();
xhr.open('GET', 'https://api.github.com/repos/vuejs/vue/issues');
xhr.send();
with above code, I can receive top 30 issues list of vue project. But if I want to get top 30 issues whose issue number is less then 8000, how can I do it?
in the github v3 api doc,there are just a feature that allow you get issues since a time point.

One way using API V3 would be to traverse through the issues and find those that you want. In any case the call to the Issues API returns issues in descending order of the date of creation. Which means you just need to traverse through the issues to find the ones having issue number lower than 8000.
In the particular case of vuejs/vue; You can increase the number of issues displayed per page to 100 and then find issues having number less than 8000 in the second page :
https://api.github.com/repos/vuejs/vue/issues?per_page=100&page=2
I feel this is a better option than using issue Search API (V3), since you do not have to deal with a very low rate limit of Github Search APIs.

Objective C For loops with #autorelease and ARC

As part of an app that allows auditors to create findings and associate photos to them (Saved as Base64 strings due to a limitation on the web service) I have to loop through all findings and their photos within an audit and set their sync value to true.
Whilst I perform this loop I see a memory spike jumping from around 40MB up to 500MB (for roughly 350 photos and 255 findings) and this number never goes down. On average our users are creating around 1000 findings and 500-700 photos before attempting to use this feature. I have attempted to use #autorelease pools to keep the memory down but it never seems to get released.
for (Finding * __autoreleasing f in self.audit.findings){
#autoreleasepool {
[f setToSync:#YES];
NSLog(#"%#", f.idFinding);
for (FindingPhoto * __autoreleasing p in f.photos){
#autoreleasepool {
[p setToSync:#YES];
p = nil;
}
}
f = nil;
}
}
The relationships and retain cycles look like this
Audit has a strong reference to Finding
Finding has a weak reference to Audit and a strong reference to FindingPhoto
FindingPhoto has a weak reference to Finding
What am I missing in terms of being able to effectively loop through these objects and set their properties without causing such a huge spike in memory. I'm assuming it's got something to do with so many Base64 strings being loaded into memory when looping through but never being released.

So, first, make sure you have a batch size set on the fetch request. Choose a relatively small number, but not too small because this isn't for UI processing. You want to batch a reasonable number of objects into memory to reduce loading overhead while keeping memory usage down. Try 50 or 100 and see how it goes, then consider upping the batch size a little.
If all of the objects you're loading are managed objects then the correct way to evict them during processing is to turn them into faults. That's done by calling refreshObject:mergeChanges: on the context. BUT - that discards any changes, and your loop is specifically there to make changes.
So, what you should really be doing is batch saving the objects you've modified and then turning those objects back into faults to remove the data from memory.
So, in your loop, keep a counter of how many you've modified and save the context each time you hit that count and refresh all the objects that were processed so far. The batch on the fetch and the batch size to save should be the same number.

There's probably a big difference in size between your "Finding" objects and the associated images. So your primary aim should be to redesign your database in a way so that unfaulting (loading) a Finding object does not automatically load the base64 encoded image.
That's actually one of the major strengths of Code Data: Loading part of an object hierarchy. Just try to move the base64 encoded data to an own (managed) object so that Core Data does not load it. It will still be loaded as needed when the reference is touched.

Give reads priority over writes in Elasticsearch

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}

After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string