AWS CLI for searching a file in s3 bucket - aws-cli

I want to search for a file name abc.zip in s3 buckets and there are nearly 60 buckets and each buckets have 2 to 3 levels subdirectories or folders .I tried to perform search using AWS CLI commands and below are the commands which i tried but even though the file is existing in the bucket.The results are not being displayed for the file.
aws s3api list-objects --bucket bucketname --region ca-central-1 \
--recursive --query "Contents[?contains(Key, 'abc.zip')]"
aws s3 ls --summarize --human-readable --recursive bucketname \
--region ca-central-1 | egrep 'abc.zip'
For all the above commands execution i dont see the filename in command line and when i manually check the bucket the file exists.
Is there any way i can find the file.

Hmm.
I used your command from #1 without "--recursive" because this throws Unknown options: --recursive. The file I was searching for is on the second level of the bucket and it was found. --region is also not used.
My guess is you are using some old version of AWS client or pointing to an incorrect bucket.
My working command:
aws s3api list-objects --bucket XXXXX --query "Contents[?contains(Key, 'animate.css')]"
[
{
"LastModified": "2015-06-14T23:29:03.000Z",
"ETag": "\"e5612f9c5bc799b8b129e9200574dfd2\"",
"StorageClass": "STANDARD",
"Key": "css/animate.css",
"Owner": {
"DisplayName": "XXXX",
"ID": "XXXX"
},
"Size": 78032
}
]
If you decide to upgrade your CLI client: https://github.com/aws/aws-cli/tree/master
Current version is awscli-1.15.77 which you may check by aws --version.

I tried in the following way
aws s3 ls s3://Bucket1/folder1/2019/ --recursive |grep filename.csv
This outputs the actual path where the file exists
2019-04-05 01:18:35 111111 folder1/2019/03/20/filename.csv
Hope this helps!

I know this is ancient, but I found a way to do this without piping text to grep...
aws s3api list-objects-v2 --bucket myBucket --prefix 'myFolder' \
--query "Contents[*]|[?ends_with(Key,'jpg')].[Key]"

I think previous answers are correct but if you want make this, bucket agnostic, then you can use the below script all you have to do is change the variable value (search_value) on the first line to what you
are searching for and add your id and secret:
#!/usr/bin/sh
export AWS_ACCESS_KEY_ID=your_key; export AWS_SECRET_ACCESS_KEY=your_secret;
search_value="3ds"
my_array=( `aws s3api list-buckets --query "Buckets[].Name"|grep \" |sed 's/\"//g'|sed 's/\,//g'` )
my_array_length=${#my_array[#]}
for element in "${my_array[#]}"
do
echo "----- ${element}"
aws s3 ls s3://"${element}" --recursive |grep -i $search_value
done
Warning....it will search every single bucket in your account so be prepared for a long search....
It does pattern search so it will find any words that contains the value
Lastly this is case insensitive search ... (you can disable that by removing -i from grep line)
done

Related

Resuming interrupted s3 download with s3api

I want to resume a s3 file download within linux docker container. I am using
aws s3api get-object \
--bucket mybucket \
--key myfile \
--range "bytes=$size-" \
/dev/fd/3 3>>myfile
It seems /dev/fd/3 3>>myfile works for MAC OS and it appends the next range of data to existing file. However when I try the same command in linux it replaces the original file with next range of contents.

How to use an environment variable in an AWS CLI command

I run this command and it works :
aws elb describe-load-balancers --query 'LoadBalancerDescriptions[?VPCId==`vpc-#########`]|[].LoadBalancerName' --region us-east-2
If I try and use an environemnt variable it does not work :
aws elb describe-load-balancers --query 'LoadBalancerDescriptions[?VPCId==`$VPC_ID`]|[].LoadBalancerName' --region us-east-2
I know that VPC_ID is valid - echo $VPC_ID returns the correct value
What am I not seeing?
Thanks!!!!!
I also tried this command with the same results :
This works fine :
aws elb describe-load-balancers --output text --query 'LoadBalancerDescriptions[?Instances[?InstanceId==`i-0############`]].[LoadBalancerName]' --region us-east-2
This returns nothing :
aws elb describe-load-balancers --output text --query 'LoadBalancerDescriptions[?Instances[?InstanceId=="$InstanceID"]].[LoadBalancerName]' --region us-east-2
I know that the environment variable $InstanceID is populated and correct - I perform an echo $InstanceID and get the correct ID output.
Got it!!
The environment variable need to be in brackets - { }
This works -
aws elb describe-load-balancers --output text --query "LoadBalancerDescriptions[?Instances[?InstanceId=='${InstanceID}']].LoadBalancerName" --region us-east-2
I am able to reproduce this using the following:
export MY_VPC_ID=vpc-1234
echo 'LoadBalancerDescriptions[?VPCId==`$MY_VPC_ID`]|[].LoadBalancerName'
OUTPUT:
LoadBalancerDescriptions[?VPCId==`$MY_VPC_ID`]|[].LoadBalancerName
I believe this has to do with how bash interprets quotes as shown in this other post
Evaluating variables in a string
Can you try using this?
echo "LoadBalancerDescriptions[?VPCId==\"$MY_VPC_ID\"]|[].LoadBalancerName"
OUTPUT:
LoadBalancerDescriptions[?VPCId=="vpc-1234"]|[].LoadBalancerName

Running two aws cli commands both at once

I would like to run the following:
aws ram get-resource-share-invitations
aws ram accept-resource-share-invitation --resource-share-invitation-arn <value from first query>
both in one line taking the output from the first query and using it in second.
Is there a way to do this? I want to use the above script inside terraform null_resource. As we cannot get output from null_resource. I was thinking if I combine both queries into one it will resolve my problem.
Yes you can chain AWS cli commands together by using xargs
CAVEATS: I don't use AWS RAM so I'm unable to provide a specific example but this should get you on the right road. I also have not tested this in Terraform.
This code describes all classic ELB resources and sends the load-balancer-name of each to the describe-load-balancer-attributes which requires the load-balancer-name
aws elb describe-load-balancers --query 'LoadBalancerDescriptions[*].[LoadBalancerName]' --output text | xargs -I {} aws elb describe-load-balancer-attributes --load-balancer-name {}
What I think will work but I have no way to test is:
aws ram get-resource-share-associations --association-type <blah> --query 'resourceShareAssociations[*].[resourceShareArn]' --output text |xargs -I {} aws ram accept-resource-share-invitation --resource-share-invitation-arn {}

How to get an RDS endpoint for a specific VPC using AWS CLI

I have the command to list all the RDS endpoints I have running in my aws account but I want to find RDS endpoint for RDS running in the same VPC as the ec2 instance I want to use it from.
I have multiple VPC's up with multiple RDS's so when I issue the command it gives me all the running RDS's. How can i filter this to just show me the one in the same VPC?
I run the command -
aws rds --region us-east-2 describe-db-instances --query "DBInstances[*].Endpoint.Address"
And I get -
"acme-networkstack.vbjrxfom0phf.us-east-2.rds.amazonaws.com",
"acme-aws-beta-network.vbjrxfom0phf.us-east-2.rds.amazonaws.com",
"acme-demo.vbjrxfom0phf.us-east-2.rds.amazonaws.com",
"acme-dev.vbjrxfom0phf.us-east-2.rds.amazonaws.com"
I only want the one endpoint that is in the same VPC as the instance I am running the CLI command from.
Thanks!
Ernie
Here's a little script that should do the trick, just replace the ec2 describe-instanceswith your rds cli command:
#!/bin/bash
mac=`curl -s http://169.254.169.254/latest/meta-data/mac`
vpcID=`curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$mac/vpc-id`
aws ec2 describe-instances --region eu-west-1 --filter "Name=vpc-id,Values=$vpcID"
You're first curling the instance meta-data to find it's VpcId, and then filtering the outputs of your cli command to limit to a certain vpc.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-output.html
describe-db-instances has a limited set of filters which doesn't include the VPC. The solution I suggest uses a combination of the meta-data information from the host and jq to select only the endpoints that match the VPC.
First, You can get the VPC ID as suggested by WarrenG.
#!/bin/bash
mac=`curl -s http://169.254.169.254/latest/meta-data/mac`
VPC_ID=`curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$mac/vpc-id`
Then uses the AWS CLI in combination with jq to derive your desired output.
aws rds describe-db-instances | jq -r --arg VPC_ID "VPC_ID" '.DBInstances[] |select (.DBSubnetGroup.VpcId==$VPC_ID) | .Endpoint.Address'
I haven't run this from a script but it works from the command line. If it doesn't work in a script let me know.
References
https://docs.aws.amazon.com/cli/latest/reference/rds/describe-db-instances.html
Passing bash variable to jq select

How do I run a python script and files located in an aws s3 bucket

I have python script pscript.py which takes input parameters -c input.txt -s 5 -o out.txt. The files are all located in an aws s3 bucket. How do I run it after creating an instance? Do I have to mount the bucket on EC2 instance and execute the code? or use lambda? I am not sure. Reading so many aws documentations kinda confusing.
Command line run is as follows:
python pscript.py -c input.txt -s 5 -o out.txt
You should copy the file from Amazon S3 to the EC2 instance:
aws s3 cp s3://my-bucket/pscript.py
You can then run your above command.
Please note that, to access the object in Amazon S3, you will need to assign an IAM Role to the EC2 instance. The role needs sufficient permission to access the bucket/object.

Resources