I'm trying to make a bucket of images public read even when uploaded from another AWS account. I have the current bucket policy in place:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::mybucketname/*"
}
]
}
This works great when I upload using credentials from the primary account, but when I upload from the other account added by ACL it doesn't apply. As I read I found that you can add bucket-owner-full-control or public-read but not both. My end goal is to allow the object to be fully accesses by both AWS accounts AND have public read access on upload. Is this possible (ideally without two requests)?
The above accepted answer is incorrect as S3 bucket policies are ignored on objects published from another account.
The correct way to apply multiple ACLs on a cross account publish is like follows:
aws s3 cp --recursive blahdir s3://bucketname/blahdir/ --cache-control public,max-age=31536000 --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers full=id=S3_ACCOUNT_CANONICAL_ID
No, it seems you can only specify one ACL when creating an object in Amazon S3.
For details of what each Canned ACL means, see: Canned ACL
For details of how ownership works, see: Amazon S3 Bucket and Object Ownership
Personally, I would not recommend using ACLs to control access. They are a hold-over from the early days of Amazon S3. These days, I would recommend using a Bucket Policy if you wish to make a large number of objects public, especially if they are in the same bucket/path.
Thus, an object can be uploaded with bucket-owner-full-control and the Bucket Policy can make them publicly accessible.
Related
I have built my own Docker container that provides inference code to be deployed as endpoint on Amazon Sagemaker. However, this container needs to have access to some files from s3. The used IAM role has access to all s3 buckets that I am trying to reach.
Code to download files using a boto3 client:
import boto3
model_bucket = 'my-bucket'
def download_file_from_s3(s3_path, local_path):
client = boto3.client('s3')
client.download_file(model_bucket, s3_path, local_path)
The IAM role's policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::my-bucket/*"
]
}
]
}
Starting the docker container locally allows me to download files from s3 just like expected.
Deploying as an endpoint on Sagemaker, however, the request times out:
botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='my-bucket.s3.eu-central-1.amazonaws.com', port=443): Max retries exceeded with url: /path/to/my-file (Caused by ConnectTimeoutError(<botocore.awsrequest.AWSHTTPSConnection object at 0x7f66244e69b0>, 'Connection to my-bucket.s3.eu-central-1.amazonaws.com timed out. (connect timeout=60)'))
Any help is appreciated!
for security reasons they don't let it access s3 natively, you need to hook it up to a VPC
https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html
For anyone coming across this question, when creating a model, the 'Enable Network Isolation' property defaults to True.
From AWS docs:
If you enable network isolation, the containers are not able to make any outbound network calls, even to other AWS services such as Amazon S3. Additionally, no AWS credentials are made available to the container runtime environment.
So this property needs to be set to False in order to connect to any other AWS service.
There are many Git issues opened on the Terraform repo about this issue, with lots of interesting comments, but as of now I still see no solution to this issue.
Terraform stores plain text values, including passwords, in tfstate files.
Most users are required to store them remotely so the team can work concurrently on the same infrastructure with most of them storing the state files in S3.
So how do you hide your passwords?
Is there anyone here using Terraform for production? Do you keep you passwords in plain text?
Do you have a special workflow to remove or hide them? What happens when you run a terraform apply then?
I've considered the following options:
store them in Consul - I don't use Consul
remove them from the state file - this requires another process to be executed each time and I don't know how Terraform will handle the resource with an empty/unreadable/not working password
store a default password that is then changed (so Terraform will have a not working password in the tfstate file) - same as above
use the Vault resource - sounds it's not a complete workflow yet
store them in Git with git-repo-crypt - Git is not an option either
globally encrypt the S3 bucket - this will not prevent people from seeing plain text passwords if they have access to AWS as a "manager" level but it seems to be the best option so far
From my point of view, this is what I would like to see:
state file does not include passwords
state file is encrypted
passwords in the state file are "pointers" to other resources, like "vault:backend-type:/path/to/password"
each Terraform run would gather the needed passwords from the specified provider
This is just a wish.
But to get back to the question - how do you use Terraform in production?
I would like to know what to do about best practice, but let me share about my case, although it is a limited way to AWS. Basically I do not manage credentials with Terraform.
Set an initial password for RDS, ignore the difference with lifecycle hook and change it later. The way to ignore the difference is as follows:
resource "aws_db_instance" "db_instance" {
...
password = "hoge"
lifecycle {
ignore_changes = ["password"]
}
}
IAM users are managed by Terraform, but IAM login profiles including passwords are not. I believe that IAM password should be managed by individuals and not by the administrator.
API keys used by applications are also not managed by Terraform. They are encrypted with AWS KMS(Key Management Service) and the encrypted data is saved in the application's git repository or S3 bucket. The advantage of KMS encryption is that decryption permissions can be controlled by the IAM role. There is no need to manage keys for decryption.
Although I have not tried yet, recently I noticed that aws ssm put-parameter --key-id can be used as a simple key value store supporting KMS encryption, so this might be a good alternative as well.
I hope this helps you.
The whole remote state stuff is being reworked for 0.9 which should open things up for locking of remote state and potentially encrypting of the whole state file/just secrets.
Until then we simply use multiple AWS accounts and write state for the stuff that goes into that account into an S3 bucket in that account. In our case we don't really care too much about the secrets that end up in there because if you have access to read the bucket then you normally have a fair amount of access in that account. Plus our only real secrets kept in state files are RDS database passwords and we restrict access on the security group level to just the application instances and the Jenkins instances that build everything so there is no direct access from the command line on people's workstations anyway.
I'd also suggest adding encryption at rest on the S3 bucket (just because it's basically free) and versioning so you can retrieve older state files if necessary.
To take it further, if you are worried about people with read access to your S3 buckets containing state you could add a bucket policy that explicitly denies access from anyone other than some whitelisted roles/users which would then be taken into account above and beyond any IAM access. Extending the example from a related AWS blog post we might have a bucket policy that looks something like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::MyTFStateFileBucket",
"arn:aws:s3:::MyTFStateFileBucket/*"
],
"Condition": {
"StringNotLike": {
"aws:userId": [
"AROAEXAMPLEID:*",
"AIDAEXAMPLEID"
]
}
}
}
]
}
Where AROAEXAMPLEID represents an example role ID and AIDAEXAMPLEID represents an example user ID. These can be found by running:
aws iam get-role -–role-name ROLE-NAME
and
aws iam get-user -–user-name USER-NAME
respectively.
If you really want to go down the encrypting the state file fully then you'd need to write a wrapper script that makes Terraform interact with the state file locally (rather than remotely) and then have your wrapper script manage the remote state, encrypting it before it is uploaded to S3 and decrypting it as it's pulled.
I have multiple EC2 instances originating form a single VPC and i want to assign a bucket policy to my s3 to make sure that only that VPC traffic will be allowed to access the bucket so i created an end point for that VPC and it added all the policies and routes in routing table. I assigned a following policy to my bucket
{
"Version": "2012-10-17",
"Id": "Policy1415115909153",
"Statement": [
{
"Sid": "Access-to-specific-VPCE-only",
"Action": "s3:*",
"Effect": "Allow",
"Resource": ["arn:aws:s3:::examplebucket",
"arn:aws:s3:::examplebucket/*"],
"Condition": {
"StringtEquals": {
"aws:sourceVpce": "vpce-111bbb22"
}
}
}
]
}
but it does not work when i connect to my Bucket using AWS-SDK for nodejs i get access denied error. The nodejs application is actually running in the Ec2 instance launched in same VPC as end point.
I even tried VPC level bucket policy but still i get access denied error. Can anyone tell me if i need to include any endpoint parameter in SDK S3 connection or any other thing?
I have an EMR Spark Job that needs to read data from S3 on one account and write to another.
I split my job into two steps.
read data from the S3 (no credentials required because my EMR cluster is in the same account).
read data in the local HDFS created by step 1 and write it to an S3 bucket in another account.
I've attempted setting the hadoopConfiguration:
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "<your access key>")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","<your secretkey>")
And exporting the keys on the cluster:
$ export AWS_SECRET_ACCESS_KEY=
$ export AWS_ACCESS_KEY_ID=
I've tried both cluster and client mode as well as spark-shell with no luck.
Each of them returns an error:
ERROR ApplicationMaster: User class threw exception: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
Access Denied
The solution is actually quite simple.
Firstly, EMR clusters have two roles:
A service role (EMR_DefaultRole) that grants permissions to the EMR service (eg for launching Amazon EC2 instances)
An EC2 role (EMR_EC2_DefaultRole) that is attached to EC2 instances launched in the cluster, giving them access to AWS credentials (see Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances)
These roles are explained in: Default IAM Roles for Amazon EMR
Therefore, each EC2 instance launched in the cluster is assigned the EMR_EC2_DefaultRole role, which makes temporary credentials available via the Instance Metadata service. (For an explanation of how this works, see: IAM Roles for Amazon EC2.) Amazon EMR nodes use these credentials to access AWS services such as S3, SNS, SQS, CloudWatch and DynamoDB.
Secondly, you will need to add permissions to the Amazon S3 bucket in the other account to permit access via the EMR_EC2_DefaultRole role. This can be done by adding a bucket policy to the S3 bucket (here named other-account-bucket) like this:
{
"Id": "Policy1",
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1",
"Action": "s3:*",
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::other-account-bucket",
"arn:aws:s3:::other-account-bucket/*"
],
"Principal": {
"AWS": [
"arn:aws:iam::ACCOUNT-NUMBER:role/EMR_EC2_DefaultRole"
]
}
}
]
}
This policy grants all S3 permissions (s3:*) to the EMR_EC2_DefaultRole role that belongs to the account matching the ACCOUNT-NUMBER in the policy, which should be the account in which the EMR cluster was launched. Be careful when granting such permissions -- you might want to grant permissions only to GetObject rather than granting all S3 permissions.
That's all! The bucket in the other account will now accept requests from the EMR nodes because they are using the EMR_EC2_DefaultRole role.
Disclaimer: I tested the above by creating a bucket in Account-A and assigning permissions (as shown above) to a role in Account-B. An EC2 instance was launched in Account-B with that role. I was able to access the bucket from the EC2 instance via the AWS Command-Line Interface (CLI). I did not test it within EMR, however it should work the same way.
Using spark you can also use assume role to access an s3 bucket in another account but using an IAM Role in the other account. This makes it easier for the other account owner to manage the permissions provided to the spark job. Managing access via s3 bucket policies can be a pain as access rights are distributed to multiple locations rather than all contained within a single IAM role.
Here is the hadoopConfiguration:
"fs.s3a.credentialsType" -> "AssumeRole",
"fs.s3a.stsAssumeRole.arn" -> "arn:aws:iam::<<AWSAccount>>:role/<<crossaccount-role>>",
"fs.s3a.impl" -> "com.databricks.s3a.S3AFileSystem",
"spark.hadoop.fs.s3a.server-side-encryption-algorithm" -> "aws:kms",
"spark.hadoop.fs.s3a.server-side-encryption-kms-master-key-id" -> "arn:aws:kms:ap-southeast-2:<<AWSAccount>>:key/<<KMS Key ID>>"
External IDs can also be used as a passphrase:
"spark.hadoop.fs.s3a.stsAssumeRole.externalId" -> "GUID created by other account owner"
We were using databricks for the above have not tried using EMR yet.
I believe you need to assign an IAM role to your compute nodes (you probably already have done this), then grant cross-account access to that role via IAM on the "Remote" account. See http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html for the details.
For controlling access of the resources, generally IAM roles are managed as a standard practice. Assume roles are used when you want to access resources in a different account. If you or your organisation follow the same then you should follow https://aws.amazon.com/blogs/big-data/securely-analyze-data-from-another-aws-account-with-emrfs/.
The basic idea here is to use a credentials provider with which the access is obtained by EMRFS to access objects in S3 buckets.
You can go one step further and make the ARN for STS and buckets parameterized for the JAR created in this blog.
I cannot authenticate to AWS Simple Queue Service (SQS) from an EC2 instance using its associated IAM Role with Boto 2.38 library (and Python 3).
I couldn't find anything specific on documentation about it, but as far as I could understand from examples and other questions around, it was supposed to work just opening a connection like this.
conn = boto.sqs.connect_to_region('us-east-1')
queue = conn.get_queue('my_queue')
Instead, I get a null object from the connect method, unless I provide credentials on my environment, or explicitly to the method.
I'm pretty sure my role is ok, because it works for other services like S3, describing EC2 tags, sending metrics to CloudWatch, etc, all transparently. My SQS policy is like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SQSFullAccess",
"Effect": "Allow",
"Action": [
"sqs:*"
],
"Resource": [
"arn:aws:sqs:us-east-1:<account_id>:<queue_name1>",
"arn:aws:sqs:us-east-1:<account_id>:<queue_name2>"
]
}
]
}
In order to get rid of any suspicion about my policy, I even associated a FullAdmin policy to my role temporarily, without success.
I also verified that it won't work with AWS CLI as well (which, as far as I know, uses Boto as well). So, the only conclusion I could come up with is that this is a Boto issue with SQS client.
Would anyone have a different experience with it? I know that switching to Boto 3 would probably solve it, but I don't consider doing it right now and if it is really a bug, I think it should be reported on git, anyway.
Thanks.
Answering myself.
Boto's 2.38 SQS client does work with IAM Roles. I had a bug in my application.
As for AWS CLI, a credential file (~/.aws/credentials) was present in my local account, and being used instead of the instance's role, because the role is the last one to be looked up by the CLI.