Batch operation in AWS SSM putParameter. Is there an option to write parameters in bulk to ParameterStore in AWS? - aws-parameter-store

Is there an option to write parameters in bulk to ParameterStore in AWS? I have tried the putParameter API, but from the documentation I can see that only one parameter can be updated at a time. This operation takes around 20 milliseconds ( I maybe wrong ), so if I need to update some 20 parameters, it will exceed 400 ms. Typically, I have a requirement to accommodate for up to 50 parameters. Is there a better way to handle updating of parameters in parameter store?

Related

How to create custom retry logic for aiobotocore?

I'm trying to upload a lot of files to S3. This cannot be done with the standard AWS CLI because of the translation required between file names on disk and object names in S3. Indeed may of the objects don't exist at all on disk.
I keep getting an error:
botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 0): Please reduce your request rate
It doesn't seem to make a difference wether I use boto3 / botocore / aioboto3 / aiobotocore. I've tried various configurations of retry logic as described here. Nothing seems to fix the problem. That includes all three retry modes and retry counts ranging everything from 0 to 50.
I could add custom retry logic to every method that calls the client but that's going to be a lot of work and feels like the wrong approach.
Is it possible to customize the retry logic used by boto3 or aiobotocore?

getParametersByPath page size

In AWS-SDK for Node.js getParametersByPath() SSM method returns paged data, meaning I have to make several calls in a loop. Is there a way to disable pagination or increase page size (beyond 10, which is max for MaxNumberparameter) ?
Alternatively, if several calls have to be made - can results be collected into array of Promises to be resolved at once?
Basically, we have about 12 parameters that we need to load on a Lambda startup. Currently getting parameters takes about 2 seconds - because 2 consecutive calls are made. Ideally it has to either be one call or 2 calls made in parallel.
I'm going to give you a simple answer for this and a question I've had to solve for a few times:
Alternatively, if several calls have to be made - can results be collected into array of Promises to be resolved at once?
Yes, and here's how -
const { SSM } = require('aws-sdk');
const ssm = new SSM();
function getSSMStuff(path, memo = [], nextToken) {
return ssm
.getParametersByPath({ Path: path, WithDecryption: true, Recursive: true, NextToken: nextToken, MaxResults: 10 })
.promise()
.then(({ Parameters, NextToken }) => {
const newMemo = memo.concat(Parameters);
return NextToken ? getSSMStuff(path, newMemo, NextToken) : newMemo;
});
}
if you invoke that with a path, it'll recursively call itself to get all parameters in that path, finally resolving to an array of complete parameters. Obviously, you should do your own flavor of error handling, but that's the gist.
This is a good question. Let me outline some concepts, AWS design decisions and how I would work around the limitations they impose.
1. AWS Server Side SSM API
All AWS client SDKs, including the AWS-SDK for Node.js, make API calls towards the respective AWS service's REST API, e.g. the API action GetParametersByPath of the AWS Systems Manager endpoint [1].
2. Pagination
The AWS APIs usually implement the so called cursor-based pagination concept. This type of pagination has other characteristics than page-based pagination (i.e. classic offset-limit pagination) - described in several Medium articles [2][3].
One major limitation of cursor-based pagination is: It doesn’t allow sending parallel requests for different batches. There is no way to jump for a specific page, it needs to iterate through all the prior pages.
Looking at the pros and cons of each pagination type, the AWS engineers probably made a trade-off in favor of consistency at the cost of performance.
3. API Limitation
AWS states that the valid range for the MaxResults parameter is 1-10. That means, there is no way for a client to extend the page size beyond 10. All SDKs must adhere to that limit when implementing pagination. Disabling pagination [4] has the only effect of returning a single page of up to 10 items.
4. Possible Workarounds
You could sync the SSM API results to a custom DynamoDB table and query DynamoDB instead of SSM. If this solution is viable depends on several characteristics, e.g. consistency requirements, how often SSM parameters are changing and confidentiality of SSM parameter values (or which parameter attributes are read by the Lambda function).
You could send concurrent requests to the API if you split the set of expected results properly and use the ParameterFilters API parameter [1]. You can split by following keys: tag:.+|Name|Type|KeyId|Path|Label|Tier|DataType using the BeginsWith or Equals option. [5]
This solution requires that you can make assumptions about the results you expect to receive. You must ensure that a particular subset does not contain more than 10 entries. If there are e.g. max. 10 SSM parameters per department, you could do the following: Send one request via getParametersByPath() per department, specifying MaxResults=10 and Path=/department/<name> or ParameterFilters=[{Key=tag:department,Option=Equals,Values=["<name>"]}]. These department-specific requests could be sent concurrently.
Workaround 2 could be considered the "two concurrent requests" approach you mentioned in the question and workaround 1 is the "one single request" approach. Workarounds require to either drop the consistency property or making assumptions about the data partitioning.
References
[1] https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParametersByPath.html#systemsmanager-GetParametersByPath-request-NextToken
[2] https://medium.com/innomizetech/how-to-build-an-api-with-aws-lambda-to-paginate-data-using-serverless-41c4b6b676a4
[3] https://medium.com/swlh/paginating-requests-in-apis-d4883d4c1c4c
[4] https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html#cli-usage-pagination-serverside
[5] https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_ParameterStringFilter.html#systemsmanager-Type-ParameterStringFilter-Option
Yes it is achievable. I gathered 92 parameters with the below logic (it is coded in java, but can be used as a reference) -
//initialize
List<Parameter> parameters = new ArrayList<>();
String nextToken = null;
//logic
do {
GetParametersByPathRequest request = new GetParametersByPathRequest();
request.setWithDecryption(true);
request.setPath("......");
request.setRecursive(true);
request.setNextToken(nextToken);
GetParametersByPathResult parametersByPathResult = source.getParametersByPath(request);
nextToken = parametersByPathResult.getNextToken();
parameters.addAll(parametersByPathResult.getParameters());
} while(nextToken!=null);

Overcoming Azure Vision Read API Transactions-Per-Second (TPS) limit

I am working on a system where we are calling Vision Read API for extracting the contents from raster PDF. Files are of different sizes, ranging from one page to several hundred pages.
Files are stored in Azure Blob and there will be a function to push files to Read API once when all files are uploaded to blob. There could be hundreds of files.
Therefore, when the process starts, a large number of documents are expected to be sent for text extraction per second. But Vision API has limit of 10 transactions per second including read.
I am wondering what would be best approach? Some type of throttling or queue?
Is there any integration available (say with queue) from where the Read API will pull documents and is there any type of push notification available to notify about completion of read operation? How can I prevent timeouts due to exceeding 10 TPS limit?
Per my understanding , there are 2 key points you want to know :
How to overcome 10 TPS limit while you have lot of files to read.
Looking for a best approach to get the Read operation status and
result.
Your question is a bit broad,maybe I can provide you with some suggestions:
For Q1, Generally ,if you reach TPS limit , you will get a HTTP 429 response , you must wait for some time to call API again, or else the next call of API will be refused. Usually we retry the operation using something like an exponential back off retry policy to handle the 429 error:
2.1) You need check the HTTP response code in your code.
2.2) When HTTP response code is 429, then retry this operation after N seconds which you can define by yourself such as 10 seconds…
For example, the following is a response of 429. You can set your wait time as (26 + n) seconds. (PS: you can define n by yourself here, such as n = 5…)
{
"error":{
"statusCode": 429,
"message": "Rate limit is exceeded. Try again in 26 seconds."
}
}
2.3) If step 2 succeed, continue the next operation.
2.4) If step 2 fail with 429 too, retry this operation after N*N seconds (you can define by yourself too) which is an exponential back off retry policy..
2.5) If step 4 fail with 429 too, retry this operation after NNN seconds…
2.6) You should always wait for current operation to succeed, and the Waiting time will be exponential growth.
For Q2,, As we know , we can use this API to get Read operation status/result.
If you want to get the completion notification/result, you should build a roll polling request for each of your operation at intervals,i.e. each 10 seconds to send a check request.You can use Azure function or Azure automation runbook to create asynchronous tasks to check read operation status and once its done , handle the result based on your requirement.
Hope it helps. If you have any further concerns , please feel free to let me know.

How kinesis keep the offset and push the record again when an event fails in lambda

I am new to AWS lambda and Kinesis. Please help with the following question
I have a kinesis stream as a source to lambda and the target is again kinesis. I have following queries.
The system doesnt want to lose a record.
if any of the records fails the processing in lambda, How it again pull into the lambda? How it keep the unprocessed records ? How kinesis track the offset to process the next record?
Please update.
From the AWS Lambda docs about using Lambda with Kinesis:
If your function returns an error, Lambda retries the batch until processing succeeds or the data expires. Until the issue is resolved, no data in the shard is processed. To avoid stalled shards and potential data loss, make sure to handle and record processing errors in your code.
In this context, also consider the Retention Period of Kinesis:
The retention period is the length of time that data records are accessible after they are added to the stream. A stream’s retention period is set to a default of 24 hours after creation. You can increase the retention period up to 168 hours (7 days)
As mentioned in the first quote, AWS will drop the event after the retention period is due. This means for you:
a) Take care that your Lambda function handles errors correctly.
b) If it's important to keep all records, also store them in a persistent storage, e.g. DynamoDB.
In addition to that, you should read about duplicate Lambda executions as well. There is a great blog post available explaining how you can achieve an idempotent implementation. And read here on another StackOverflow question & answer.

OData query for all rows from the last 10 minutes

I need to filter rows from an Azure Table Store that are less than 10 minutes old. I'm using a Azure Function App integration to query the table, so a coded solution is not viable in this case.
I'm aware of the datetime type, but for this I have to specify an explicit datetime, for example -
Timestamp gt datetime'2018-07-10T12:00:00.1234567Z'
However, this is insufficient as I need the query to run on a timer every 10 minutes.
According to the OData docs, there are built in functions such as totaloffsetminutes() and now(), but using these causes the function to fail.
[Error] Exception while executing function: Functions.FailedEventsCount. Microsoft.WindowsAzure.Storage: The remote server returned an error: (400) Bad Request.
Is there a way to query a Table Store dynamically in this way?
Turns out that this was easier than expected.
I added the following query filter to the Azure Table Store input integration -
Timestamp gt datetime'{afterDateTime}'
In conjunction with a parameter in the Function trigger route, and Bob's your uncle -
FailedEventsCount/after/{afterDateTime}
Appreciate for other use cases it may not be viable to pass in the datatime, but for me that is perfectly acceptable.

Resources