how many objects are returned by aws s3api list-objects? - linux

I am using:
aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text
to get the list of files in a bucket.
The aws s3api list-object documentation page says that this command returns only up to a 1000 objects, however I noticed that in my case it returns the names of all files in my bucket. For example when I run the following command:
aws s3api list-objects --endpoint-url https://my.end.point/ --bucket my.bucket.name --query 'Contents[].Key' --output text | tr "\t" "\n" | wc -l
I get 13512 displayed, meaning that more than 13 thousand file names were returned.
Am I missing smth?
I use the following aws cli version:
aws-cli/1.10.57 Python/2.7.3 Linux/3.2.0-4-amd64 botocore/1.4.47

Returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket. [1]
I think that the part "(up to 1000)" in the documentation's description is highly misleading. It refers to the maximal page size per underlying HTTP request which is sent by the cli. The documentation of the --page-size option makes this clear:
The size of each page to get in the AWS service call. This does not affect the number of items returned in the command's output. Setting a smaller page size results in more calls to the AWS service, retrieving fewer items in each call. This can help prevent the AWS service calls from timing out.
It gets even clearer when reading the AWS documentation about pagination [2] which describes:
For commands that can return a large list of items, the AWS Command Line Interface (AWS CLI) adds three options that you can use to control the number of items included in the output when the AWS CLI calls a service's API to populate the list.
By default, the AWS CLI uses a page size of 1000 and retrieves all available items. For example, if you run aws s3api list-objects on an Amazon S3 bucket that contains 3,500 objects, the CLI makes four calls to Amazon S3, handling the service-specific pagination logic for you in the background and returning all 3,500 objects in the final output.
As Ankit already stated correctly, using the --max-items option is the correct solution to limit the result and stop the automatic pagination:
To include fewer items at a time in the AWS CLI output, use the --max-items option. The AWS CLI still handles pagination with the service as described above, but prints out only the number of items at a time that you specify. [2]
References
[1] https://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
[2] https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html

Try using --max-items with the command.
The doc mentions it will return NextMarker when the no of items are more than max-items. You can pass it as starting-token in the next call to achieve pagination.

Related

How to find length of result array in Azure CLI via JMESPath?

I am trying to "explore" json results from an Azure CLI command using the --query switch (e.g. az functionapp list --query <something>), and to get started I'd like the length of the resulting array.
The Azure CLI help says nothing specific, and points to jmespath.org which does indeed show that a length function exists, however it seems to require an argument. I have no name for the argument, which is the root/outermost array returned by the list command.
It seems from jmespath.org that length(something) is what I want, but I don't know what to put in for the "something" part. What do I put here? Or am I going about this all wrong??
As we know az functionapp list returns a json where the root node is an array. In order to get the length of this array we can use the following syntax:
az functionapp list --query "[] | length(#)"

List files from s3 greater than some lastModified date

We are using https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#listObjects-property method of s3 in Node JS lambda to get all objects currently. This returns all the objects upto 1000. Is there any way to get the files whose lastModified date > input last Modified date from s3 using this method?
This isn't possible via the S3 API.
The best you can do is get creative with your object naming scheme, and name things in reverse alphabetical order. Starting with something like ZZZZZZZZZZZ, then ZZZZZZZZZZY, etc.

How to use JMESPath to query AWS CLI RDS instances by DBInstanceIdentifier

I need a list of RDS DBInstanceIdentifier that match the String "foobar" in their name. I found many solutions with exact match, but not substring matching. My approach looks as follows:
I get a list of all DBInstanceIdentifier using:
aws rds describe-db-instances --query "DBInstances[*].[DBInstanceIdentifier][]"
which looks like
[
"machine-001-alice-abcdefg",
"machine-002-bob-abcdefg",
"machine-003-foobar-abcdefg"
]
On the list I apply a filter like in the last example of the JMSES Tutorial
aws rds describe-db-instances --query "DBInstances[*].[DBInstanceIdentifier][]|[?contains(#,'dev') =='true']"
If I change the statement to != I get the full list, so it seems I have the filter statement wrong.
true needs to be backticked not quoted and then the backticks need to be escaped it seems - different shells may vary.
aws rds describe-db-instances --query "DBInstances[*].[DBInstanceIdentifier][]|[?contains(#,'dev')==\`true\`]"
aws rds describe-db-instances --query "DBInstances[*].[DBInstanceIdentifier][]|[?contains(#,'dev')!=\`true\`]"
You can also omit the comparison to true but I couldn't invert this successfully
aws --profile pollen-nonprod rds describe-db-instances --query "DBInstances[*].[DBInstanceIdentifier][]|[?contains(#,'dev')]"
(I'd normally do this sort of thing with jq but that's a different solution rather than necessarily a better one)

AWS cli. how to query snapshots and their name tags

first of all thanks for taking the time in helping me out on this one.
I have a 12300 long list of snapshots, working on deleting certain snapshots, so im trying to list them all first thru the CLI.
I want to get the SnapshotID, the StartTime, and from the tags, the 'Name'
I tried quite a few querys, but all of them result in null on the name :/
THsi is my latest one:
aws ec2 describe-snapshots --query 'Snapshots[*].{ID:SnapshotId,Time:StartTime,Name:Tags[?Key=='Name'].Value[*]}'
Is this something one can do? or should i query all Key pairs, and then filter them out with --filters ?
Few issues to be considered:
Beware of the type of quote marks around the Key Names(backticks, not single quotes)
Forcing a single value out of the tag array.
You should specify the --owner-ids otherwise all accessible snapshots will be listed (including ones that don't belong to your account)
This command works:
aws ec2 describe-snapshots --query 'Snapshots[*].{ID:SnapshotId,Time:StartTime,Name:Tags[?Key==`Name`]|[0].Value}' --owner-ids *<YOUR-ACCOUNT-ID>*

return text list of values one line per instance-id with awscli --query

I have instances in AWS that have the same ReservationId (they were launched at the same time and they have AmiLaunchIndex of 0 thru x ). My goal is to produce text output with one line per instance, such as this. I added column headers for clarity.
OwnerId ReservationId InstanceId PrivateIpAddress AmiLaunchIndex
12345678910 r-poiu4567 i-asdf1234 10.0.0.1 0
12345678910 r-poiu4567 i-qwer4312 10.0.1.1 1
... etc ...
In the jmespath gitter channel, the map function was suggested as a way to accomplish this, but I can't figure out how to use the function. Any suggestions?
--query Reservations[*].Instances[*].[InstanceId] --output text
Just add the brakets
you would need to run the following command
aws ec2 describe-instances \
--filters "Name=reservation-id,Values=r-poiu4567"
--query 'Reservations[*].{owner:OwnerId,ReservationId:ReservationId,instance:Instances[].InstanceId | [0]}' \
--output text
You can add the other parameters you want
This will provide the desired output (all elements in one line) without the header as something like
i-08eec92943c9cc576 325979260958 r-0b13a131efa6b3af8
i-07a25c4ae7e6abecb 325979260958 r-05a51aefe5b72358d
....
Unfortunately, to do this right I think we'd need https://github.com/jmespath/jmespath.site/pull/6
In this particular case you can hack this result by using the owner in Network Interfaces, which is almost certain to be the same in practice:
Reservations[].Instances[].[NetworkInterfaces[0].OwnerId, InstanceId, KeyName]
(use an object instead of an array if you want the column headings)
ec2 describe-instances --query
'Reservations[*].
{
id:ReservationId,
requester:RequesterId,
instance:Instances[].InstanceId |[0],
lifecycle:Instances[].InstanceLifecycle | [0]
}
'
--output text
....worked for me.
Finally, I was able to get my desired result by using the map function of jmespath as follows. Kudos to folks in the jmespath/chat channel on gitter.
aws ec2 describe-instances --query "map(&[], Reservations[].[OwnerId,ReservationId,Instances[].InstanceId | []])" --output text
An equivalent alternative expression is:
aws ec2 describe-instances --query "Reservations[].[OwnerId,ReservationId,Instances[].InstanceId | []] | map(&[], #)" --output text
see this jmespath issue and this one where this is discussed.

Resources