Databricks not able to create table in minio bucket

Databricks not able to create table in minio bucket - apache-spark

Trying to create table in minio bucket using databricks.
spark.sql("create database if not exists minio_db_1 managed location 's3a://my-bucket/minio_db_1'");
I am passing the s3 configurations using spark context.
access_key = 'XXXX'
secret_key = 'XXXXXXX'
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", access_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", secret_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "http://my-ip:9000")
Can anyone please point out the configs lacking here for table creation?
With this config I am able to write data in s3 using
df.write.format("parquet").save("s3a://my-bucket/file-path");
But it's throwing exception when I m trying to create table/database;
spark.sql("create database if not exists minio_db_1 managed location 's3a://my-bucket/minio_db_1'");
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://my-bucket/my-database: getFileStatus on s3a://test2/minio_db_1: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://test2.s3.us-east-1.amazonaws.com minio_db_1 {} Hadoop 3.3.4, aws-sdk-java/1.12.189 Linux/5.4.0-1093-aws OpenJDK_64-Bit_Server_VM/25.345-b01 java/1.8.0_345 scala/2.12.14 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: 6YBEAZY59EYGAEVB, Extended Request ID: o+h6YBGczQmWsnFMW8kLGi+llJ+v3ysqoz05fnNYTH901+ACgmi5x50dE2ekXbNrr3qQf81uOx8=, Cloud Provider: AWS, Instance ID: i-072d1969af3c17cb6 (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 6YBEAZY59EYGAEVB; S3 Extended Request ID: o+h6YBGczQmWsnFMW8kLGi+llJ+v3ysqoz05fnNYTH901+ACgmi5x50dE2ekXbNrr3qQf81uOx8=; Proxy: null), S3 Extended Request ID: o+h6YBGczQmWsnFMW8kLGi+llJ+v3ysqoz05fnNYTH901+ACgmi5x50dE2ekXbNrr3qQf81uOx8=:403 Forbidden)
The request should routed to the s3a endpoint, but it's routing to the generic s3 endpoint. Somehow spar.sql not honouring the spark context configurations.

Related

Connect to Cloud Storage through kubernetes pod with NodeJS

what I want to achieve, is that for my pods that live inside GKE to share files. So what I'm thinking is using the GoogleCloudStorage to write and read the files.
I have created a service account in my kubetcl
kubectl create serviceaccount myxxx-svc-account --namespace
myxxx
Then I also created the service account in my GCP console
Then, I added the roles of roles/iam.workloadIdentityUser in my GCP account
Next, I annotated my kubectl account with my GCP service account
kubectl annotate serviceaccount --namespace myxxx
myxxx-svc-account
iam.gke.io/gcp-service-account=myxxx-svc-account#myxxx-xxxxx.iam.gserviceaccount.com
I also added the roles of Storage Admin and Storage Object Admin in the GCP - IAM page
Then, in my deployment.yaml, I included my service account
spec:
serviceAccountName: myxxx-account
Bellow is how I try to upload a file to the storage
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
const bucket = storage.bucket('bucket-name');
const options = {
destination: '/folder1/folder2/123456789'
};
bucket.upload("./index.js", options, function(uploadError, file, apiResponse) {
console.log(uploadError.message)
console.log(uploadError.stack)
});
I deploy my node application to the GKE pods through docker. In the dockerFile, im using
FROM node
...
...
...
CMD ["node", "index.js"]
But I always get unauthorized 403 error
Could not refresh access token: A Forbidden error was returned while
attempting to retrieve an access token for the Compute Engine built-in
service account. This may be because the Compute Engine instance does
not have the correct permission scopes specified: Could not refresh
access token: Unsuccessful response status code. Request failed with
status code 403
Error: Could not refresh access token: A Forbidden
error was returned while attempting to retrieve an access token for
the Compute Engine built-in service account. This may be because the
Compute Engine instance does not have the correct permission scopes
specified: Could not refresh access token: Unsuccessful response
status code. Request failed with status code 403
at Gaxios._request (/opt/app/node_modules/gaxios/build/src/gaxios.js:130:23)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async metadataAccessor (/opt/app/node_modules/gcp-metadata/build/src/index.js:68:21)
at async Compute.refreshTokenNoCache (/opt/app/node_modules/google-auth-library/build/src/auth/computeclient.js:54:20)
at async Compute.getRequestMetadataAsync (/opt/app/node_modules/google-auth-library/build/src/auth/oauth2client.js:298:17)
at async Compute.requestAsync (/opt/app/node_modules/google-auth-library/build/src/auth/oauth2client.js:371:23)
at async Upload.makeRequest (/opt/app/node_modules/#google-cloud/storage/build/src/resumable-upload.js:574:21)
at async retry.retries (/opt/app/node_modules/#google-cloud/storage/build/src/resumable-upload.js:306:29)
at async Upload.createURIAsync (/opt/app/node_modules/#google-cloud/storage/build/src/resumable-upload.js:303:21)
What I'm doing wrong? seems like I have given the permission already? how can I troubleshoot it? Is it related with the docker image?

You are not authorized to perform this operation. (Service: AmazonEC2; Status Code: 403

I am a free tier aws user. I tried creating a group in IAM and created some user, in root user I created a policy which has no deletion policy, and I applied it to a group which I created.
But when I am accessing RDS by the user which I added into group, at that time I am not able to create any database. I am getting the respective error "You are not authorized to perform this operation. (Service: AmazonEC2; Status Code: 403; Error Code: UnauthorizedOperation; Request ID: 80839f5f-d08c-435d-850f-7ab185421d35; Proxy: null)"

Cross Account AWS redshift connection- connection timeout

I am trying to connect AWS redshift in different account with AWS glue setup in another account. I have setup the cross account connectivity via IAM roles trust entity and its working fine.
I am able to get the Redshift cluster credentials via STS. But after creating the boto3 client for redshift using the temporary credentials and while creating the connection, getting below error.
InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out')).
Below is my setup.
Below is my simple code.
assume_role_response=_get_sts_credentials()
if 'Credentials' in assume_role_response:
assumed_session = boto3.Session(
aws_access_key_id=assume_role_response['Credentials']['AccessKeyId'],
aws_secret_access_key=assume_role_response['Credentials']['SecretAccessKey'],
aws_session_token=assume_role_response['Credentials']['SessionToken'])
client = assumed_session.client('redshift')
logger.info('Getting temp Redshift credentials')
try:
redshift_temp_credentials = client.get_cluster_credentials(DbUser=redshift_user_name,
DbName=redshift_database,
ClusterIdentifier=redshift_cluster_id,
AutoCreate=False)
print('temp username is {} and password is {}'.format(redshift_temp_credentials['DbUser'],
redshift_temp_credentials['DbPassword']))
connection = redshift_connector.connect(host=redshift_host,
database=redshift_database,
user=redshift_temp_credentials['DbUser'],
password=redshift_temp_credentials['DbPassword'])
return connection

Upload a file from form in S3 bucket using boto3 and handler is created in lambda

I want to upload image , audio files of small size from form to the S3 using postman for test. I successfully uploaded file in AWS S3 bucket from my application running on my local machine. Following is the part of the code I used for file uploading .
import boto3
s3_client = boto3.client('s3',aws_access_key_id =AWS_ACCESS_KEY_ID,aws_secret_access_key = AWS_SECRET_ACCESS_KEY,)
async def save_file_static_folder(file, endpoint, user_id):
_, ext = os.path.splitext(file.filename)
raw_file_name = f'{uuid.uuid4().hex}{ext}'
# Save image file in folder
if ext.lower() in image_file_extension:
relative_file_folder =user_id+'/'+endpoint
contents = await file.read()
try:
response = s3_client.put_object(Bucket = S3_BUCKET_NAME,Key = (relative_file_folder+'/'+raw_file_name),Body = contents)
except:
return FileEnum.ERROR_ON_INSERT
I called this function from another endpoint and form data (e.g. name, date of birth and other details) are successfully saved in Mongodb database and files are uploaded in S3 bucket.
This app is using fastapi and files are uploaded in S3 bucket while deploying this app in local machine.
Same app is delpoyed in AWS lambda and S3 bucket as storage. For handling whole app , following is added in endpoint file.
handler = Mangum(app)
After deploying app in AWS creating lambda function from root user account of AWS, files didnot get uploaded in S3 bucket.
If I didnot provide files during form then the AWS API endpoint successfully works. Form data gets stored in MongoDB database (Mongodb atlas) and app works fine hosted using Lambda.
App deployed using Lambda function works successfully except file uploads in form. FOr local machine, file uploads in S3 get success.
EDIT
While tracing in Cloudwatch I got following error
exception An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
I checked AWS Access Key Id and secret key many times and they are correct and root user credentials are kept.

It looks like you have configured your Lambda function with an execution IAM role, but you are overriding the AWS credentials supplied to the boto3 SDK here:
s3_client = boto3.client('s3',aws_access_key_id =AWS_ACCESS_KEY_ID,aws_secret_access_key = AWS_SECRET_ACCESS_KEY,)
You don't need to provide credentials explicitly because the boto3 SDK (and all language SDKs) will automatically retrieve credentials dynamically for you. So, ensure that your Lambda function is configured with the correct IAM role, and then change your code as follows:
s3_client = boto3.client('s3')
As an aside, you indicated that you may be using AWS root credentials. It's generally a best security practice in AWS to not use root credentials. Instead, create IAM roles and IAM users.
We strongly recommend that you do not use the root user for your everyday tasks, even the administrative ones. Instead, adhere to the best practice of using the root user only to create your first IAM user. Then securely lock away the root user credentials and use them to perform only a few account and service management tasks.

Boto3 not assuming IAM role from credentials where aws-cli does without problem

I am setting up some file transfer scripts and am using boto3 to do this.
I need to send some files from local to a third party AWS account (cross-account). I have a role set-up on the other account with permissions to write to the bucket, and assigned this role to a user on my account.
I am able to do this no problem on CLI, but Boto keeps on kicking out an AccessDenied error for the bucket.
I have read through the boto3 docs on this area such as they are here, and have set-up the credential and config files as they are supposed to be (assume they are correct as the CLI approach works), but I am unable to get this working.
Credential File:-
[myuser]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Config File:-
[profile crossaccount]
region = eu-west-2
source_profile=myuser
role_arn = arn:aws:iam::0123456789:role/crossaccountrole
and here is the code I am trying to get working with this:-
#set-up variables
bucket_name = 'otheraccountbucket'
file_name = 'C:\\Users\\test\\testfile.csv'
object_name = 'testfile.csv'
#create a boto session with profile name for assume role call to be made with correct credentials
session = boto3.Session(profile_name='crossaccount')
#Create s3_client from that profile based session
s3_client = session.client('s3')
#try and upload the file
response = s3_client.upload_file(
file_name, bucket, object_name,
ExtraArgs={'ACL': 'bucket-owner-full-control'}
)
EDIT:
in response to John's multi-part permission comment, I have tried to upload via put_object method to bypass this - but still getting AccessDenied, but now on the PutObject permission - which I have confirmed is in place:-
#set-up variables
bucket_name = 'otheraccountbucket'
file_name = 'C:\\Users\\test\\testfile.csv'
object_name = 'testfile.csv'
#create a boto session with profile name for assume role call to be made with correct credentials
session = boto3.Session(profile_name='crossaccount')
#Create s3_client from that profile based session
s3_client = session.client('s3')
#try and upload the file
with open(file_name, 'rb') as fd:
response = s3_client.put_object(
ACL='bucket-owner-full-control',
Body=fd,
Bucket=bucket,
ContentType='text/csv',
Key=object_name
)
Crossaccountrole has PutObject permissions - error is :-
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
END EDIT
Here is the working aws-cli command:-
aws s3 cp "C:\Users\test\testfile.csv" s3://otheraccountbucket --profile crossaccount
I am expecting this to upload correctly as the equivalent cli code does, but instead I get an S3UploadFailedError exception - An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied
Any Help would be much appreciated

I had this same problem, my issue ended up being the fact that I had AWS CLI configured with different credentials than my python app where I was trying to use Boto3 to upload files into an s3 bucket.
Here's what worked for me, this only applies to people that have AWS CLI installed:
Open your command line or terminal
Type aws configure
Enter the ID & Secret key of the IAM user you are using for your python boto3 app when prompted
Run your python app and test boto3, you should no longer get the access denied message

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Databricks not able to create table in minio bucket - apache-spark

Related

Connect to Cloud Storage through kubernetes pod with NodeJS

You are not authorized to perform this operation. (Service: AmazonEC2; Status Code: 403

Cross Account AWS redshift connection- connection timeout

Upload a file from form in S3 bucket using boto3 and handler is created in lambda

Boto3 not assuming IAM role from credentials where aws-cli does without problem

Categories

Resources