Run queries in AWS Athena from boto3 gives bad permissions

Run queries in AWS Athena from boto3 gives bad permissions - python-3.x

When trying to run queries from python (boto3) to AWS Athena, the following error is raised:
botocore.exceptions.ClientError: An error occurred
(AccessDeniedException) when calling the StartQueryExecution
operation: User: arn:aws:iam::account-id:user/sa.prd is not
authorized to perform: athena:StartQueryExecution on resource:
arn:aws:athena:us-east-1:account-id:workgroup/primary
I don't have access to AWS console. I was also informed that there is another user "sa.prd.athena" that has the right permissions (what seems not to happen to "sa.prd").
Is it possible to use boto3 specifying a different user? Now don't use any specific user.
If not possible to use a different user, is it possible to set some kind of policy to be used by boto3 in runtime execution (this because I don't have access to AWS management console)
Thanks,
BR

The User in AWS is determined by the credentials that are used to sign the API call to the AWS API. There are several ways to pass these credentials to AWS SDKs in general (and boto3 in particular).
It looks for credentials in these places and takes them from the first one where they're present:
Hard-Coded credentials while instantiating a client
Credentials stored in environment variables
Credentials stored in ~/.aws/credentials (By default it uses those of the default profile)
In the instance metadata service on EC2/ECS/Lambda
Since you're not directly setting up credentials, I assume it takes them from the SDK configuration (3), so you could just overwrite them while instantiating your Athena client like this:
import boto3
athena_client = boto3.client(
'athena',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN
)
This is an adapted example from the documentation, you need to specify your credentials instead of the uppercase variables.
Hardcoding these is considered bad practice though, so you might want to look into option (2) using environment variables, or setting up another profile in your local SDK and telling the client to use that. Information on that can be found in the boto3-docs I linked above.

Related

In PySpark, is there a way to pass credentials as variables into spark.read?

Spark allows us to read directly from Google BigQuery, as shown below:
df = spark.read.format("bigquery") \
.option("credentialsFile", "googleKey.json") \
.option("parentProject", "projectId") \
.option("table", "project.table") \
.load()
However having the key saved on the virtual machine, isn't a great idea. I have the Google key saved as JSON securely in a credential management tool. The key is read on-demand and saved into a variable called googleKey.
Is it possible to pass JSON into speak.read, or pass in the credentials as a Dictionary?

The other option is credentials. From spark-bigquery-connector docs:
How do I authenticate outside GCE / Dataproc?
Credentials can also be provided explicitly, either as a parameter or from Spark runtime configuration. They should be passed in as a
base64-encoded string directly.
// Globally
spark.conf.set("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>")
// Per read/Write
spark.read.format("bigquery").option("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>")

This is more like chicken and egg situation. if you are storing credential file in secret manager (hope that's not your credential manager tool). How would you access secret manager. For that you might need key and where would you store that key.
For this, Azure has created a managed identities, through which two different services can talk to each other without providing any keys (credential) explicitly.

If you are running from Dataproc, then the node has a built in service account which you can control on cluster creation. In this case you do not need to pass any credentials/credentialsFile option.
If you are running on another cloud or on prem, you can use the local secret manager, or implement the connector's AccessTokenProvider which lets you full customization of the credentials creation.

AWS nodejs SDK check if can access DynamoDB table

Using the AWS SDK I can make a get request and fetch a document, I will then know if I have the IAM access to access the database.
Is there a way to test with the NodeJS AWS SDK to see if I have allow access for the action dynamodb:getItem. Of course I can just write a query but is there a way without me having to spend time writing a meaningless query?

The easiest way I can think of right this moment is to try a simple getItem like you mentioned with a primary key, but do it with the low level API. Then you are not writing a "meaningless query." If I find another way, I'll add it here.

You can simply check through the user's role through console and CLI as well with the get-user-policy command.
CLI Approach:
aws iam get-user-policy --user-name Bob --policy-name ExamplePolicy
With the help of this command, you check the rights you have on that user. For detail look into this DOC.
Console Approach:
Login with AWS Console and Search IAM service
Under the User Section, Search your user whose permission need's to check.
Then in the permission section, you can watch all the permission.

BigQuery Relation between service account and project

I am using Node library to integrate my application with BigQuery. I am planning to accept projectId, Email and private key from user and then I will validate credentials by making call to getDataset operation with limit 1 This will ensure that all 3 parameters passed by user are proper.
But then I realized that even if I pass different valid project ID, my call to getDataset passes. Operation gets datasets from that project. So I was wondering if Service account is not linked to project. Any idea how I can validate these three parameters ?

A service account key has some attributes inside it including project_id, private_key, client_email and many others. In this page you can find how to configure the credentials to use the client libraries.
Basically, the first step is creating a service account and download a JSON key (I suppose that you already completed this step)
Then you need to set an environment variable in your system so your application can access the credentials.
For Linux/Mac you can do that running:
export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
For Windows (using CMD):
set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

How do I determine which AWS Access Keys are used for boto3 calls in Python?

I'm writing a script to automatically rotate AWS Access Keys on Developer laptops. The script runs in the context of the developer using whichever profile they specify from their ~/.aws/credentials file.
The problem is if they have two API keys associated with their IAM User account, I cannot create a new key pair until I delete an existing one. However, if I delete whichever key the script is using (which is probably from the ~/.aws/credentials file, but might be from Environment variables of session tokens or something), the script won't be able to create a new key. Is there a way to determine what AWS Access Key ID is being used to sign boto3 API calls within python?
My fall back is to parse the ~/.aws/credentials file, but I'd rather a more robust solution.

Create a default boto3 session and retrieve the credentials:
print(boto3.Session().get_credentials().access_key)
That said, I'm not necessarily a big fan of the approach that you are proposing. Both keys might legitimately be in use. I would prefer a strategy that notified users of multiple keys, asked them to validate their usage, and suggest they deactivate or delete keys that are no longer in use.
You can also use IAM's get_access_key_last_used() to retrieve information about when the specified access key was last used.
Maybe it would be reasonable to delete keys that are a) inactive and b) haven't been used in N days, but I think that's still a stretch and would require careful handling and awareness among your users.
The real solution here is to move your users to federated access and 100% use of IAM roles. Thus no long-term credentials anywhere. I think this should be the ultimate goal of all AWS users.

How to use BigQuery API in Vue.js

I'm fairly new to Vue.js, but I've built some basic CRUD apps using axios.
What I want to do is use Google Cloud BigQuery to pull in raw data and then display or manipulate it in Vue. My goal is to make a sort of simple data dashboard where you can filter things or display some different insights from a handful of BigQuery queries.
I can install BigQuery API as a dependency from Vue GUI. But after that I'm a little lost. How do I import BigQuery into my code? How do I run the example code to fetch some public data?
I'm also unsure how to include the google credentials. I currently have this line in vue.config.js, but unsure if this is correct:
process.env.VUE_APP_GOOGLE_APPLICATION_CREDENTIALS = '/Google_Cloud_Key/Sandbox-f6ae6239297e.json'
Given the lack of any resources out there for doing this, I also wonder, should I not be trying to retrieve data this way? Should I make an intermediate API that runs the BigQuery queries and then returns JSON to my Vue app?

In order to make requests to the BigQuery API, you need to use a Service Account, which belongs to your project, and it is used by the Google BigQuery Node.js client library to make BigQuery API requests.
First, set an environment variable with your PROJECT_ID which you will use:
export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value core/project)
Next, create a new service account to access the BigQuery API by using:
gcloud iam service-accounts create my-bigquery-sa --display-name "my bigquery service account"
Next, create credentials that your code will use to login as your new service account. Create these credentials and save it as a JSON file ~/key.json by using the following command:
gcloud iam service-accounts keys create ~/key.json --iam-account my-bigquery-sa#${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com
Set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the BigQuery API library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created. Set the environment variable by using the following command:
export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"
You can read more about authenticating the BigQuery API.
The following example shows how to initialize a client and perform a query on a BigQuery public dataset. Moreover in the samples/ directory you can find a lot of examples, such as Extract Table JSON, Get Dataset and many more.
I hope you find the above pieces of information useful.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Run queries in AWS Athena from boto3 gives bad permissions - python-3.x

Related

In PySpark, is there a way to pass credentials as variables into spark.read?

AWS nodejs SDK check if can access DynamoDB table

BigQuery Relation between service account and project

How do I determine which AWS Access Keys are used for boto3 calls in Python?

How to use BigQuery API in Vue.js

Categories

Resources