Cleaning up AMIs and EBS Snapshots via AWS Lambda - python-3.x

I have created following lambda function on my local machine so I can deploy it and run it through cloudwatch event cron expression on daily basis to cleanup the desired AMI and its SnapShots. It also takes care of abandon EBS SnapShots as well.
The criteria of deleting the AMI is first find the AMI that doesn't have DoNotDelete:true tag, and if its more than 7 days old, mark it for deletion. The function exempt the AMI which is currently being used by AWS Launch Configuration.
I am sure there are few ways to optimize this lambda function and code and I would like to know how can I improve/optimize this further.
import boto3
from datetime import timedelta, datetime, timezone
import logging
import botocore
#Intialize logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def ami_cleanup(event,context):
'''Clean AMIs and its associated SnapShots which are older than 7 Days and without "DoNotDelete=true" tag in a AWS Region
Exempt AMI which is currently being used in AWS Launch Config'''
ec2 = boto3.client('ec2')
autoscaling = boto3.client('autoscaling')
ami_response = ec2.describe_images(Owners=['self'])
snapshot_response = ec2.describe_snapshots(OwnerIds=['self'])
lc_response = autoscaling.describe_launch_configurations()
amis = {}
amidnd = []
for i in ami_response['Images']:
for tag in i.get('Tags',''):
if 'DoNotDelete' in tag.values():
amidnd.append(i.get('ImageId'))
break
for ami in lc_response['LaunchConfigurations']:
if ami['ImageId'] not in amidnd:
amidnd.append(ami['ImageId'])
for i in ami_response['Images']:
if i.get('Tags') == None or i['ImageId'] not in amidnd:
amis[i.get('ImageId')] = i.get('CreationDate')
if not amis:
logger.info('No AMIs and SnapShots found to be deregister')
else:
for ami,cdate in amis.items():
if cdate < (datetime.now(timezone.utc)-timedelta(days=7)).isoformat():
logger.info('De-registering...'+ami)
ec2.deregister_image(ImageId=ami)
for snapshot in snapshot_response['Snapshots']:
if ami in snapshot.get('Description',''):
logger.info('Deleting '+snapshot.get('SnapshotId') + " of "+ami)
ec2.delete_snapshot(SnapshotId=snapshot.get('SnapshotId'))
else:
logger.info('No AMIs and SnapShots found to be older than 7 days')
break
abandon_snap_clean(ami_response,snapshot_response)
def abandon_snap_clean(ami_response,snapshot_response):
'''Clean abandon ebs snapshots of which no AMI has been found'''
snapdndids = []
for i in ami_response['Images']:
for snap in i['BlockDeviceMappings']:
if 'Ebs' in snap.keys():
snapdndids.append(snap['Ebs']['SnapshotId'])
for snapid in snapshot_response['Snapshots']:
if snapid['SnapshotId'] not in snapdndids:
try:
logger.info('Deleting abandon snapshots '+snapid['SnapshotId'])
ec2.delete_snapshot(SnapshotId=snapid['SnapshotId'])
except botocore.exceptions.ClientError as error:
if error.response['Error']['Code'] == 'InvalidSnapshot.InUse':
logger.info('SnapShotId '+snapid['SnapShotId']+' is already being used by an AMI')
else:
raise error
else:
logger.info('No abandon EBS SnapShots found to clean up')
break
else:
logger.info('No SnapShots found')

It does seem that you have a logic issue here, if you come across an image that isn't more than 7 days old, the loop breaks while there could still be other images that are older than 7 days. Switch the break to continue
if cdate < (datetime.now(timezone.utc)-timedelta(days=7)).isoformat():
logger.info('De-registering...'+ami)
ec2.deregister_image(ImageId=ami)
for snapshot in snapshot_response['Snapshots']:
if ami in snapshot.get('Description',''):
logger.info('Deleting '+snapshot.get('SnapshotId') + " of "+ami)
ec2.delete_snapshot(SnapshotId=snapshot.get('SnapshotId'))
else:
logger.info('No AMIs and SnapShots found to be older than 7 days')
continue

Related

How to get list of running VMs from AzureML

I am a beginner with Python and with AzureML.
Currently, my task is to list all the running VMs (or Compute Instances) with status and (if running) for how long they ran.
I managed to connect to AzureML and list Subscriptions, Resource Groups and Workspaces, but I'm stuck on how to list running VMs now.
Here's the code that I have currently:
# get subscriptions list using credentials
subscription_client = SubscriptionClient(credentials)
sub_list = subscription_client.subscriptions.list()
print("Subscription ID".ljust(column_width) + "Display name")
print(separator)
for group in list(sub_list):
print(f'{group.subscription_id:<{column_width}}{group.display_name}')
subscription_id = group.subscription_id
resource_client = ResourceManagementClient(credentials, subscription_id)
group_list = resource_client.resource_groups.list()
print(" Resource Groups:")
for group in list(group_list):
print(f" {group.name}{group.location}")
print(" Workspaces:")
my_ml_client = Workspace.list(subscription_id, credentials, group.name)
for ws in list(my_ml_client):
try:
print(f" {ws}")
if ws:
compute = ComputeTarget(workspace=ws, name=group.name)
print('Found existing compute: ' + group.name)
except:()
Please note that this is more or less a learning exercise and it's not the final shape of the code, I will refactor once I get it to work.
Edit: I found an easy way to do this:
workspace = Workspace(
subscription_id=subscription_id,
resource_group=group.name,
workspace_name=ws,
)
print(workspace.compute_targets)
Edit2: If anyone stumbles on this question and is just beginning to understand Python+Azure just like I do, all this information is from official documentation (which is just hard to follow as a beginner).
The result from 'workspace.compute_targets' will contain both Compute Instances and AML Instances.
If you need to retrieve only the VMs (like I do) you need to take an extra step to filter the result like this:
if type(compute_list[vm]) == ComputeInstance:

Is it possible to use SQLite in EFS reliably?

Is it possible to use SQLite in AWS EFS safely? In my readings trying to determine if this is viable there appears to be some allusions that it should be doable since AWS EFS implemented NFSv4 back in 2017. In practice I am having no luck getting consistent behavior out of it.
Quick Points:
"Just use AWS RDS": Due to issues with other AWS architecture another team has implemented we are trying to work around resource starving cause by the API (DynamoDB isn't an option)
"This goes against SQLite's primary use case (being a locally access DB): Yes, but given the circumstances it seems like the best approach.
I have verified that we are running nfsv4 on our EC2 instance
Current results are very inconsistent with 3 exceptions encountered irrespective of approach I use
"file is encrypted or is not a database"
"disk I/O error (potentially related to EFS open file limits)"
"database disk image is malformed" (The database actually isn't corrupted after this)
database code:
SQLITE_VAR_LIMIT = 999
dgm_db_file_name = ''
db = SqliteExtDatabase(None)
lock_file = f'{os.getenv("efs_path", "tmp")}/db_lock_file.lock'
def lock_db_file():
with open(lock_file, 'w+') as lock:
limit = 900
while limit:
try:
fcntl.flock(lock, fcntl.LOCK_EX | fcntl.LOCK_NB)
print(f'db locked')
break
except Exception as e:
print(f'Exception: {str(e)}')
limit -= 1
time.sleep(1)
if not limit:
raise ValueError(f'Timed out after 900 seconds while waiting for database lock.')
def unlock_db_file():
with open(lock_file, 'w+') as lock:
fcntl.flock(lock, fcntl.LOCK_UN)
print(f'db unlocked')
def initialize_db(db_file_path=dgm_db_file_name):
print(f'Initializing db ')
global db
db.init(db_file_path, pragmas={
'journal_mode': 'wal',
'cache_size': -1 * 64000, # 64MB
'foreign_keys': 1})
print(f'db initialized')
class Thing(Model):
name = CharField(primary_key=True)
etag = CharField()
last_modified = CharField()
class Meta:
database = db
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
#staticmethod
def insert_many(stuff):
data = [(k, v['ETag'], v['Last-Modified']) for k, v in stuff.items()]
fields = [Thing.name, Thing.etag, Thing.last_modified]
limit = 900
while True:
try:
with db.atomic():
for key_batch in chunked(data, SQLITE_VAR_LIMIT // len(fields)):
s = Thing.insert_many(key_batch, fields=[Thing.name, Thing.etag, Thing.last_modified]) \
.on_conflict_replace().execute()
break
except Exception as e:
print(f'Exception: {str(e)}')
print(f'Will try for {limit} more seconds.')
limit -= 1
time.sleep(1)
if not limit:
raise ValueError('Failed to exectue query after 900 seconds.')
Example Call:
print(f'Critical section start')
# lock_db_file() # I have tried with a secondary lock file as well
self.stuff_db = Thing()
if not Path(self.db_file_path).exists():
initialize_db(self.db_file_path)
print('creating tables')
db.create_tables([Thing], safe=True)
else:
initialize_db(self.db_file_path)
getattr(Thing, insert_many)(self.stuff_db, stuff_db)
# db.close()
# unlock_db_file()
print(f'Critical section end')
print(f'len after update: {len(stuff)}')
Additional peculiarities:
If a lamda gets stuck catching the "malformed image" exception and a new lambda execution is triggered, the error resolves in the other lambda.
After some trial and error I discovered it is a workable solution. It appears that the design will need to use APSWDatabase(..., vfs='unix-excl') to properly enforce locking.
Database code:
from peewee import *
from playhouse.apsw_ext import APSWDatabase
SQLITE_VAR_LIMIT = 999
db = APSWDatabase(None, vfs='unix-excl')
def initialize_db(db_file_path):
global db
db.init(db_file_path, pragmas={
'journal_mode': 'wal',
'cache_size': -1 * 64000})
db.create_tables([Thing], safe=True)
return Thing()
class Thing(Model):
field_1 = CharField(primary_key=True)
field_2 = CharField()
field_3 = CharField()
class Meta:
database = db
This allows for the following usage:
db_model = initialize_db(db_file_path)
with db:
# Do database queries here with the db_model
pass
Note: If you don't use the context managed database connection you will need to explicitly call db.close() otherwise the lock will not be released from the file. Additionally, calling db_init(...) causes a lock to be placed on the databased until it is closed.

Identify and reboot workspaces in AWS using Lambda

I have to write an AWS lambda function in python using boto3. The main aim of the function is that it detects all the unhealthy workspaces in a directory and reboots the workspaces whose state is unhealthy.
I have created a cloudwatch alarm which triggers the SNS and which in turns triggers the lambda.
I have no idea how to iterate through workspaces in a directory using python which will detect the unhealthy state.
Can anybody please provide me the sample code in python so that I can write the lambda.
Thanks
import json
import boto3
client = boto3.client('workspaces')
def lambda_handler(event, context):
statusCode = 200
print("Alarm activated")
DirectoryId = "d-966714f11"
UnhealthyWorkspace = []
if(DirectoryId == 'd-966714f114'):
response = client.describe_workspaces(
WorkspaceIds = (should be in an array)
)
us = response["Contents"]
for i in us:
if(State == 'Unhealthy'):
print(i)
UnhealthyWorkspace.append(i)
response1 = client.reboot_workspaces(
RebootWorkspaceRequests=[
{
'WorkspaceId' : UnhealthyWorkspace
}
]
)
Use describe_workspaces() to retrieve a list of all Workspaces.
Then, loop through the list of Workspace and check for: State = 'UNHEALTHY'

How to check EMR spot instance price history with boto

I'd like to create an EMR cluster programmatically using spot pricing to achieve some cost savings. To do this, I am trying to retrieve EMR spot instance pricing from AWS using boto3 but the only API available that I'm aware of from Boto3 is to use the ec2 client's decribe_spot_price_history call - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.describe_spot_price_history
The prices from EC2 are not indicative of the pricing for EMR as seen here - https://aws.amazon.com/emr/pricing/. The values are almost double that of EMR's.
Is there a way that I can see the spot price history for EMR similar to EC2? I have checked https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html and several other pages of documentation from AWS online about this and have found nothing.
Here's a code snippet that I use to check approximate pricing that I can use to bid on EMR instances.
max_bid_price = 0.140
min_bid_price = max_bid_price
az_choice = ''
response = ec2.describe_spot_price_history(
Filters=[{
'Name': 'availability-zone',
'Values': ['us-east-1a', 'us-east-1c', 'us-east-1d']
},
{
'Name': 'product-description',
'Values': ['Linux/UNIX (Amazon VPC)']
}],
InstanceTypes=['r5.2xlarge'],
EndTime=datetime.now(),
StartTime=datetime.now()
)
# TODO: Add more Subnets in other AZ's if picking from our existing 3 is an issue
# 'us-east-1b', 'us-east-1e', 'us-east-1f'
for spot_price_history in response['SpotPriceHistory']:
print(spot_price_history)
if float(spot_price_history['SpotPrice']) <= min_bid_price:
min_bid_price = float(spot_price_history['SpotPrice'])
az_choice = spot_price_history['AvailabilityZone']
The above fails since the prices for EC2 spot instances are a bit higher than what Amazon would charge for the normal hourly amount for EMR on-demand instances. (e.g. on demand for a cluster of that size only costs $0.126/hour, but on demand for EC2 is $0.504/hour and spot instances go for about $0.20/hour).
There's no such thing called EMR spot pricing, as already mentioned in the comment. Spot pricing is for EC2 instances. You can look at this AWS spot advisor page to find out which instance categories have lower interruption rate, and choose based on that.
Since 2017, AWS has changed the algorithm for spot pricing, "where prices adjust more gradually, based on longer-term trends in supply and demand", so you probably don't need to look at the historical spot prices. More details about that can be found here.
Nowadays, you're most likely gonna be fine using the last price (+ delta) for that instance. This can be achieved using the following code snippet:
def get_bid_price(instancetype, aws_region):
instance_types = [instancetype]
start = datetime.now() - timedelta(days=1)
ec2_client = boto3.client('ec2', aws_region)
price_dict = ec2_client.describe_spot_price_history(StartTime=start,
InstanceTypes=instance_types,
ProductDescriptions=['Linux/UNIX (Amazon VPC)']
)
if len(price_dict.get('SpotPriceHistory')) > 0:
PriceHistory = namedtuple('PriceHistory', 'price timestamp')
price_list = [PriceHistory(round(float(item.get('SpotPrice')), 3), item.get('Timestamp'))
for item in price_dict.get('SpotPriceHistory')]
price_list.sort(key=lambda tup: tup.timestamp, reverse=True)
# Maybe add 10 cents to the last spot price
bid_price = round(float(price_list[0][0] + .01), 3)
return bid_price
else:
raise ValueError('Invalid instance type: {} provided. '
'Please provide correct instance type.'.format(instancetype))

How to get latest Snapshot for a volume in AWS using API

I want only the latest snapshot for a specific volume.
response_v=boto3.client("ec2").describe_snapshots(Filters=[{"Name":"volume-id","Values":["vol-fffffffffff"]}])
How can it be done?
It looks like the describe_snapshots method returns the newest one first but you really shouldn't count on that.
I think you can safely rely on the StartTime field, looking for the greatest value for all snapshots returned.
Snapshots occur asynchronously; the point-in-time snapshot is created immediately
Because of that the "largest" StartTime will be the latest snapshot
I wrote this bit of code to print the snapshot_id with the latest snapshot start time. My python-fu is not the greatest but this works.
import boto3
import datetime
import pytz
utc = pytz.UTC
starttime=datetime.datetime(1,1,1,tzinfo=utc)
snap_id = ""
volume_id = "<put your volume id here or write something more elegant to pass it in>"
region = 'us-east-1'
session = boto3.Session(profile_name='default')
ec2 = session.client('ec2', region_name=region)
response = ec2.describe_snapshots(Filters=[{"Name":"volume-id","Values":[volume_id]}])
# print(response['Snapshots'])
for snap in response['Snapshots']:
if snap['StartTime'] > starttime:
snap_id = snap['SnapshotId']
starttime= snap['StartTime']
print(snap_id)
References
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html

Resources