I would like help and or opinion on how to proceed with this Python. The idea is that it is started through an AWS Lambda trigger and takes the new event (file) from S3 and sends it to my sFTP server.
Code:
import boto3
import urllib.parse
import os
from ftplib import FTP_TLS
#Info server SFTP GCore
FTP_HOST = 'valueFTPHOST'
FTP_USER = 'valueFTPUSER'
FTP_PWD = 'valueFTPPASS'
FTP_PORT = 2200
FTP_PATH = 'path'
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
if event and event['Records']:
for record in event['Records']:
sourcebucket = record['s3']['bucket']['name']
sourcekey = record['s3']['object']['key']
#Download in /tmp/
filename = os.path.basename(sourcekey)
download_path = '/tmp/'+ filename
print("printcaminho", download_path)
s3.download_file(sourcebucket, sourcekey, download_path)
print("printcaminho com o arquivo", download_path)
os.chdir("/tmp/")
with FTP_TLS(FTP_HOST, FTP_PORT, FTP_USER, FTP_PWD) as ftps, open(filename, 'rb') as file:
ftps.storbinary(f'STOR {FTP_PATH}{file.name}', file)
#Cleaning /tmp/
os.remove(filename)
Return:
Response
{
"errorMessage": "2022-08-24T14:16:18.627Z 2eed320e-7e94-4f96-9127-bab6a1ebb1cf Task timed out after 3.01 seconds"
}
The code runs without errors but I have the sFTP timeout return.
Any idea how to proceed?
Guys, this is my first question asked here on StackOverFlow. Sorry if I did something wrong. I accept suggestions to improve.
If the lambda logging includes the phrase "task timed out" it means the lambda ran longer than the timeout configuration for the function.
Here is some AWS details on troubleshooting timeouts.
To update the timeout, go to the AWS Management Console > Lambda and click your function. Go to the Configuration Tab > General Configuration. Click Edit and you'll be able to increase your timeout. This will increase the cost of your lambda function.
Related
I am following this link and getting some error:
How to upload folder on Google Cloud Storage using Python API
I have saved model in container environment and from there I want to copy to GCP bucket.
Here is my code:
storage_client = storage.Client(project='*****')
def upload_local_directory_to_gcs(local_path, bucket, gcs_path):
bucket = storage_client.bucket(bucket)
assert os.path.isdir(local_path)
for local_file in glob.glob(local_path + '/**'):
print(local_file)
print("this is bucket",bucket)
blob = bucket.blob(gcs_path)
print("here")
blob.upload_from_filename(local_file)
print("done")
path="/pythonPackage/trainer/model_mlm_demo" #this is local absolute path where my folder is. Folder name is **model_mlm_demo**
buc="py*****" #this is my GCP bucket address
gcs="model_mlm_demo2/" #this is the new folder that I want to store files in GCP
upload_local_directory_to_gcs(local_path=path, bucket=buc, gcs_path=gcs)
/pythonPackage/trainer/model_mlm_demo has 3 files in it config, model.bin and arguments.bin`
ERROR
The codes doesn't throw any error, but there is no files uploaded in GCP bucket. It just creates empty folder.
What I can see the error is, you don't need to pass the gs:// as the bucket parameter. Actually, here is an example you may need to check out,
https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
def upload_blob(bucket_name, source_file_name, destination_blob_name):
"""Uploads a file to the bucket."""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The path to your file to upload
# source_file_name = "local/path/to/file"
# The ID of your GCS object
# destination_blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(
"File {} uploaded to {}.".format(
source_file_name, destination_blob_name
)
)
I have reproduced your issue and the below code snippet works fine. I have updated the code based on folders and names you have mentioned in the question. Let me know if you have any issues.
import os
import glob
from google.cloud import storage
storage_client = storage.Client(project='')
def upload_local_directory_to_gcs(local_path, bucket, gcs_path):
bucket = storage_client.bucket(bucket)
assert os.path.isdir(local_path)
for local_file in glob.glob(local_path + '/**'):
print(local_file)
print("this is bucket", bucket)
filename=local_file.split('/')[-1]
blob = bucket.blob(gcs_path+filename)
print("here")
blob.upload_from_filename(local_file)
print("done")
# this is local absolute path where my folder is. Folder name is **model_mlm_demo**
path = "/pythonPackage/trainer/model_mlm_demo"
buc = "py*****" # this is my GCP bucket address
gcs = "model_mlm_demo2/" # this is the new folder that I want to store files in GCP
upload_local_directory_to_gcs(local_path=path, bucket=buc, gcs_path=gcs)
I just came across the gcsfs library which seems to be also about better interfaces
You could copy an entire directory into a gcs location like this:
def upload_to_gcs(src_dir: str, gcs_dst: str):
fs = gcsfs.GCSFileSystem()
fs.put(src_dir, gcs_dst, recursive=True)
I figured out a way using subprocess to upload model artefacts in GCP bucket.
import subprocess
subprocess.call('gsutil cp -r source_folder_in_local gs://*****/folder_name', shell=True, stdout=subprocess.PIPE)
If gsutil is not installed. You can install using this link:
https://cloud.google.com/storage/docs/gsutil_install
I am still learning Python (3.6) and now working on AWS. I am trying to automate a process where in the user is running a query in Athena. The results for the query are being directed to an S3 bucket. From the S3, I need to pull the file into my local and then run some more analysis using legacy tools. All this is being done step by step manually, by first firing a query in Athena Query Editor.
The problem I am facing is that the file(s) will be larger than 10GB and the SAML profile token expires after 1 hour. I have read some documentation about auto refreshing the credentials, however, while the file in being downloaded, how to even implement a solution like that. I have put my code below (that's the closest I got to a successful run with about 10000 records).
Any suggestions/help is appreciated.
import boto3
from boto3.s3.transfer import TransferConfig
import pandas as pd
import time
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)
session=boto3.Session(profile_name='saml')
athena_client = session.client("athena")
query_response = athena_client.start_query_execution(
QueryString="SELECT * FROM TABLENAME WHERE=<condition>",
QueryExecutionContext={"Database": 'some_db'},
ResultConfiguration={
"OutputLocation": 's3://131653427868-heor-epi-workbench-results',
"EncryptionConfiguration": {"EncryptionOption": "SSE_S3"},
},
WorkGroup='myworkgroup'
)
print(query_response)
iteration = 30
temp_file_location: str = "C:\\Users\\<user>\\Desktop\\Python Projects\\tablename.csv"
while(iteration > 0):
iteration = iteration - 1
print(iteration)
query_response_id = athena_client.get_query_execution(QueryExecutionId=query_response['QueryExecutionId'])
print(query_response_id)
if (query_response_id['QueryExecution']['Status']['State'] == 'FAILED') or (query_response_id['QueryExecution']['Status']['State'] == 'CANCELLED'):
print("IF BLOCK: ", query_response_id['QueryExecution']['Status']['State'])
print("The Query Failed.")
elif (query_response_id['QueryExecution']['Status']['State'] == 'SUCCEEDED'):
print("ELSE IF BLOCK: ", query_response_id['QueryExecution']['Status']['State'])
print("Query Completed. Ready to download.")
print("Proceeding to Download File......")
config = TransferConfig(max_concurrency=5)
s3_client = session.client("s3")
s3_client.download_file('131653427868-heor-epi-workbench-results',
f"{query_response['QueryExecutionId']}.csv",
temp_file_location,
Config = config
)
print("Download complete. Setting Iteration to 0 to exit loop. ")
iteration = 0
else:
print("ELSE BLOCK: ", query_response_id['QueryExecution']['Status']['State'])
print(query_response_id['QueryExecution']['Status']['State'])
time.sleep(10)
pandasDF=pd.read_csv(temp_file_location)
print(pandasDF)
I have to write an AWS lambda function in python using boto3. The main aim of the function is that it detects all the unhealthy workspaces in a directory and reboots the workspaces whose state is unhealthy.
I have created a cloudwatch alarm which triggers the SNS and which in turns triggers the lambda.
I have no idea how to iterate through workspaces in a directory using python which will detect the unhealthy state.
Can anybody please provide me the sample code in python so that I can write the lambda.
Thanks
import json
import boto3
client = boto3.client('workspaces')
def lambda_handler(event, context):
statusCode = 200
print("Alarm activated")
DirectoryId = "d-966714f11"
UnhealthyWorkspace = []
if(DirectoryId == 'd-966714f114'):
response = client.describe_workspaces(
WorkspaceIds = (should be in an array)
)
us = response["Contents"]
for i in us:
if(State == 'Unhealthy'):
print(i)
UnhealthyWorkspace.append(i)
response1 = client.reboot_workspaces(
RebootWorkspaceRequests=[
{
'WorkspaceId' : UnhealthyWorkspace
}
]
)
Use describe_workspaces() to retrieve a list of all Workspaces.
Then, loop through the list of Workspace and check for: State = 'UNHEALTHY'
I'm calling a simple python function in google cloud but cannot get it to save. It shows this error:
"Function failed on loading user code. This is likely due to a bug in the user code. Error message: Error: please examine your function logs to see the error cause: https://cloud.google.com/functions/docs/monitoring/logging#viewing_logs. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging. Please visit https://cloud.google.com/functions/docs/troubleshooting for in-depth troubleshooting documentation."
Logs don't seem to show much that would indicate error in the code. I followed this guide: https://blog.thereportapi.com/automate-a-daily-etl-of-currency-rates-into-bigquery/
With the only difference environment variables and the endpoint I'm using.
Code is below, which is just a get request followed by a push of data into a table.
import requests
import json
import time;
import os;
from google.cloud import bigquery
# Set any default values for these variables if they are not found from Environment variables
PROJECT_ID = os.environ.get("PROJECT_ID", "xxxxxxxxxxxxxx")
EXCHANGERATESAPI_KEY = os.environ.get("EXCHANGERATESAPI_KEY", "xxxxxxxxxxxxxxx")
REGIONAL_ENDPOINT = os.environ.get("REGIONAL_ENDPOINT", "europe-west1")
DATASET_ID = os.environ.get("DATASET_ID", "currency_rates")
TABLE_NAME = os.environ.get("TABLE_NAME", "currency_rates")
BASE_CURRENCY = os.environ.get("BASE_CURRENCY", "SEK")
SYMBOLS = os.environ.get("SYMBOLS", "NOK,EUR,USD,GBP")
def hello_world(request):
latest_response = get_latest_currency_rates();
write_to_bq(latest_response)
return "Success"
def get_latest_currency_rates():
PARAMS={'access_key': EXCHANGERATESAPI_KEY , 'symbols': SYMBOLS, 'base': BASE_CURRENCY}
response = requests.get("https://api.exchangeratesapi.io/v1/latest", params=PARAMS)
print(response.json())
return response.json()
def write_to_bq(response):
# Instantiates a client
bigquery_client = bigquery.Client(project=PROJECT_ID)
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset(DATASET_ID)
table_ref = dataset_ref.table(TABLE_NAME)
table = bigquery_client.get_table(table_ref)
# get the current timestamp so we know how fresh the data is
timestamp = time.time()
jsondump = json.dumps(response) #Returns a string
# Ensure the Response is a String not JSON
rows_to_insert = [{"timestamp":timestamp,"data":jsondump}]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
print(errors)
assert errors == []
I tried just the part that does the get request with an offline editor and I can confirm a response works fine. I suspect it might have to do something with permissions or the way the script tries to access the database.
I am using the Google python script to upload videos.
#!/usr/bin/python
import http.client #httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
MAX_RETRIES = 10
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, http.client.NotConnected,
http.client.IncompleteRead, http.client.ImproperConnectionState,
http.client.CannotSendRequest, http.client.CannotSendHeader,
http.client.ResponseNotReady, http.client.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# https://console.developers.google.com/.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows an application to upload files to the
# authenticated user's YouTube channel, but doesn't allow other types of access.
YOUTUBE_UPLOAD_SCOPE = "https://www.googleapis.com/auth/youtube.upload"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the Developers Console
https://console.developers.google.com/
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
VALID_PRIVACY_STATUSES = ("public", "private", "unlisted")
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,
scope=YOUTUBE_UPLOAD_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
return build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
http=credentials.authorize(httplib2.Http()))
def initialize_upload(youtube, options):
tags = None
if options.keywords:
tags = options.keywords.split(",")
body=dict(
snippet=dict(
title=options.title,
description=options.description,
tags=tags,
categoryId=options.category
),
status=dict(
privacyStatus=options.privacyStatus
)
)
# Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
part=",".join(body.keys()),
body=body,
# The chunksize parameter specifies the size of each chunk of data, in
# bytes, that will be uploaded at a time. Set a higher value for
# reliable connections as fewer chunks lead to faster uploads. Set a lower
# value for better recovery on less reliable connections.
#
# Setting "chunksize" equal to -1 in the code below means that the entire
# file will be uploaded in a single HTTP request. (If the upload fails,
# it will still be retried where it left off.) This is usually a best
# practice, but if you're using Python older than 2.6 or if you're
# running on App Engine, you should set the chunksize to something like
# 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(options.file, chunksize=-1, resumable=True)
)
resumable_upload(insert_request)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request):
response = None
error = None
retry = 0
while response is None:
try:
print ("Uploading file...")
status, response = insert_request.next_chunk()
if 'id' in response:
print ("Video id '%s' was successfully uploaded." % response['id'])
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError as e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS as e:
error = "A retriable error occurred: %s" % e
if error is not None:
print (error)
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print ("Sleeping %f seconds and then retrying..." % sleep_seconds)
time.sleep(sleep_seconds)
if __name__ == '__main__':
argparser.add_argument("--file", required=True, help="Video file to upload")
argparser.add_argument("--title", help="Video title", default="Test Title")
argparser.add_argument("--description", help="Video description",
default="Test Description")
argparser.add_argument("--category", default="22",
help="Numeric video category. " +
"See https://developers.google.com/youtube/v3/docs/videoCategories/list")
argparser.add_argument("--keywords", help="Video keywords, comma separated",
default="")
argparser.add_argument("--privacyStatus", choices=VALID_PRIVACY_STATUSES,
default=VALID_PRIVACY_STATUSES[0], help="Video privacy status.")
args = argparser.parse_args()
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
initialize_upload(youtube, args)
except HttpError as e:
print ("An HTTP error %d occurred:\n%s" % (e.resp.status, e.content))
The problem is the --description parameter. Only allow put one text line. And i need to put several lines with line jumps ('\n'). ¿it is possible to do this from another way?
Will be wonderful if this parameter (or other param) would allow a file text path to upload the description, like the "--file" parameter does.
There is something i can i do to solve this?
Or maybe one place where i'll can to contact with google developers to ask them if is posible to reimplement the initialize_upload(youtube, args) function to get it works like i say?
Yes it is possible!!
We have to add the --description-file option.
Google please, do a complete manual of your API!!!